Estimation of resolution and covariance for large matrix inversions

Geophys. J. Int. (1995) 121,409-426 Estimation of resolution and covariance for large matrix inversions Jie Zhang and George A. McMechan Center for Lithospheric Studies, The University of Texas at Dallas, PO Box 830688, Richardson, TX 75083-0688, USA Accepted 1994 September 30. Received 1994 September 30; in original form 1994 April 12. SUMMARY Key advantages of conjugate gradient (CG) methods are that they require far less computer memory than full singular value decomposition (SVD), and that iteration may be stopped at any time to give an approximate solution; this means that they may be used to obtain solutions of problems that are too large for SVD. The disadvantage is that CG does not conveniently provide auxiliary information on the quality of the solution (resolution and covariance matrices). This may be overcome by extensions of Paige and Saunders LSQR algorithm, which is one of the family of CG algorithms. The extensions are produced by analogy with SVD; bidiagonalization in LSQR produces orthonormal basis vectors that can be used to construct solutions and estimates of resolution and covariance. For large problems, for which SVD can not be performed, the new method provides approkimate resolution and covariance estimates that asymptotically approach those of the SVD solutions as the number of iterations increases. Key words: conjugate gradients, covariance, inversion, LSQR, resolution. INTRODUCTION In inverse problems such as tomography, we are interested not only in obtaining a solution but also in estimating the reliability of the solution. There exist several methods that solve an inverse problem and estimate the uniqueness of, and uncertainty in, the solution. A classical method is the generalized inverse, based on singular value decomposition (SVD) of the sensitivity matrix (Jackson 1972; Wiggins 1972; Lee & Stewart 1981). Although commonly used, the generalized inverse method can only be applied to inverse problems with a modest number of unknowns and observations. The practical limitation lies in the CPU memory requirements of the SVD algorithm (Dongarra et al. 1978). In fact, any inversion method that involves explicit multiplication of large matrices is deemed impractical. For large problems, such as tomographic velocity estimations, which commonly involve hundreds of thousands to a few million unknowns, and millions of observations, row-action methods such as the simultaneous iterative reconstruction technique (SIRT) or conjugate gradient methods such as the least-squares QR decomposition (LSQR) may be used. Both operate on one equation (one row of the matrix) at a time and so require very little computer memory. Although, there is no known way to get the data information density matrix, point spread functions may be used to estimate model resolution at a few selected locations (e.g. Clayton & Comer 1984; Humphreys & Clayton 1988; Spakman & Nolet 1988; Brzostowski & McMechan 1992). To obtain the resolution matrix for an entire model requires a number of inversion solutions equal to the number of unknowns; this is computationally prohibitive for large inverse problems (Nolet 1985; Nakanishi & Suetsugu 1986; Trampert & Leveque 1990). Another approach, the jack-knife (e.g. Lees & Crosson 1990) can be used to estimate solution variance, but also involves a number of inversion solutions equal to the number of data partitions. The LSQR algorithm of Paige & Saunders (1982a,b) is a conjugate gradient type of algorithm (Hestenes & Stiefel 1952). Faster in convergence than SIRT, LSQR has become popular for obtaining solutions to linear inverse problems (Nolet 1985, 1987; van der Sluis & van der Vorst 1987; Spakman & Nolet 1988; Meyerholtz, Palvis & Szpakowski 1989; Lees & Crosson 1990; Leiss & Pan 1991). The CPU memory required for LSQR (like SIRT) involves only a few vectors of dimensions equal to those of the observations and unknowns instead of full matrices, so large inverse problems can be solved. The LSQR method bears some very useful similarities to the classical generalized inverse method based on SVD (Spakman & Nolet 1988). It bidiagonalizes the sensitivity matrix with orthonormal vectors in the parameter and data domains (Golub & Kahan 1965; Paige 1974; Bjorck & EldCn 1979); in fact, bidiagonalization serves as the front end for certain SVD algorithms (Golub & Van Loan 1989). Nolet & Snieder (1990) and Berryman (1994a,b) have analytically explored the similarities between LSQR and SVD but these have not been fully investigated numerically. Below, we exploit these properties to construct resolution and 409

410 J. Zhang and G. A. McMechan covariance matrices for very large inversion problems, based on UQR. The method is demonstrated numerically with synthetic examples. The results are encouragingly similar to those from SVD, but without the large memory requirements of SVD. The use of our extension of the LSQR algorithm allows direct computation of approximate resolution, information density, and covariance matrices for large inverse problems; these were not previously possible with SVD. THEORETICAL BACKGROUND Solution of a linear inverse problem may be defined as finding a vector x that best (e.g. in the L, or L, sense) satisfies Ax=b, (1) where x is an n-dimensional vector containing the unknown model parameters, b is an m-dimensional vector containing observations, and A is an m x n sensitivity matrix connecting model parameters and observations. SVD solution A solution (Lanczos 1961; Lee & Stewart 1981) to eq. (1) may be obtained via singular value decomposition of matrix A to give its generalized inverse A* = VDelUT, where U and V are orthonormal matrices of dimensions rn x r and n X r, respectively, D is an r X r diagonal matrix with singular values arranged in descending order down the diagonal, D-' is the inverse of D, and r is the rank of matrix A. Superscript T represents matrix transposition. To evaluate the uniqueness of, and uncertainty in, the solution, resolution (R), information density (S), and covariance of the model parameters (C) may be constructed (e.g. Wiggins 1972; Aki & Richards 1980; Lee & Stewart 1981; Menke 1984): R = A*A = WVT, s = AA* = UUT and C = A*(A*)Ta2 = V[D-'(D-')T]VTa2, (5) where D is the standard deviation of the data. For example, in a velocity tomography problem, u would represent the error in time picking. LSQRA solution The LSQR algorithm of Paige & Saunders (1982b) avoids storage of the full matrix A when seeking a solution to eq. (1); all matrix operations involve only one row or column at a time. This makes the LSQR algorithm suitable for solving large linear inverse problems that are otherwise intractable. Also, as shown below, it can be extended to compute approximate resolution and covariance matrices for large inverse problems. Consider now an alternative linear inverse, which we (2) (3) (4) denote LSQRA, constructed through an analogy to the SVD-based generalized inverse. Following Paige (1974) and Paige & Saunders (1982a), the iterative LSQR algorithm is applied to A and b in eq. (1); A is reduced, after k iterations, to a bidiagonal matrix L and two orthonormal matrices U and V. Then B* (an approximate inverse of A) is constructed by analogy to SVD (the Appendix), as B* = VL-'UT (6a) and an approximation to A is constructed as B = ULVT, (6b) where U and V are orthonormal matrices of dimensions m X k and n X k, respectively; L is a k X k non-singular lower bidiagonal matrix; L-' is the inverse of L; and k is the iteration number. We may now express resolution (R) and information density (S) matrices (B*B and BE*, respectively), based directly on the bidiagonalization decomposition, as R = B*B and = (vl-'u~)(ulv~) = VVT S = BB* = (ulv~)(vl-'u~) = uu' Assuming that the observations are statistically independent and have the same variance v2, the parameter covariance matrix c = B *(B*)~~ = v[l-'( L-')T]VTa2 (9) can be constructed. The solution given numerically by LSQR can be expressed analytically in the form 2 = B*b = VL-'UTb (see Appendix); this explicit form of the solution (LSQRA) has apparently not been previously considered (Nolet 1987, p. 20; Meyerholtz et al. 1989), nor have its consequences been exploited. B and B* also satisfy the Moore-Penrose condition (see Appendix). If only a solution is desired, LSQR is sufficient. LSQRA will always take more computer time than LSQR because of the multiplications in eq. (A8), but this is a small fraction of the total computation. LSQRA is a viable approach to also obtaining approximate resolution and covariance matrices at any iteration. Noiet & Snieder (1990) provide a similar formulation, which uses a continuous rather than discrete parameterization. Berryman (1994a,b) uses a form equivalent to eq. (7), but does not address the question of loss of orthogonality as discussed below. Numerical considerations The bidiagonalization procedure used in LSQR has a distinct numerical property, that a loss of orthogonality among the 'Lanczos vectors' (the columns of U and V) accompanies the convergence of the solution. This loss of orthogonality is an intrinsic property of LSQR, and cannot be avoided by higher precision in the numerical calculations (7)

Large matrix inversions 41 1 (Golub & Van Loan 1989, pp. 486-488). One approach to this problem is to reorthogonalize the Lanczos vectors, which involves a substantial decrease in overall efficiency. With additional iterations beyond the loss of orthogonality, duplicate (as well as new) singular values of the lower bidiagonal matrix L will occur. It is necessary for stability of the LSQRA solution, and for the resolution and covariance estimates, that the duplicate singular values be removed, or that iteration be stopped when the loss of orthogonality occurs. We chose the former as better (i.e. closer to SVD) results could be obtained by further iteration. It is more efficient to identify and remove the duplicate singular values and their associated eigenvectors than to reorthogonalize. Scales (1989) shows how to obtain a singular value spectrum and identify duplicates using a conjugate gradient method with virtually no extra effort. We applied LSQR (the Paige & Saunders (1982a,b) version) to A, and saved the Lanczos vectors and L on disk; after this point, we diverge from LSQR algorithm. As L has dimensions (k) equal to the number of iterations (usually 10-loo), its inverse L- can be readily computed. We used SVD to diagonalize matrix L and reorganize the basis vectors U and V, as follows. With L=PDQT and L- = OD- P, where P and Q are k X k orthonormal matrices, and D is a k X k diagonal matrix, eqs (6a), (6b), (7), (8) and (9) become, respectively, B* = (vq)d- (up)~ (10a) B = (UP)D(VQ) R = (VQ)(VQ) s = (UP)(UP) and c = (VQ)D-~(VQ)*. VQ and UP are reorganized orthonormal matrices in parameter and data domains, respectively. Note this is not reorthogonalization as in Parlett & Scott (1979), but is similar to the procedure of Bjorck (1988). Eqs (loa), (ll), (12), and (13) are identical in form to those of the generalized inverse equations (2), (3), (4) and (5). The diagonal matrix (D) contains the singular values of L which are approximate singular values of the original matrix A arranged along the diagonal in descending order. Scales (1989) provides another approach to computation of singular values in conjugate gradient solutions for tridiagonal matrices. To the extent that only k independent components are represented, B may be viewed as a filtered version of A. We may exclude small singular values to control the trade-off between resolution and solution error. With eqs (ll), (12) and (13), we can construct approximate resolution, information density, and covariance matrices for large inverse problems that are intractable using the full SVD-based generalized inverse. This is the main contribution of this paper. NUMERICAL EXAMPLES In this section, the LSQRA method is illustrated with three synthetic traveltime tomography examples, and the results are compared with those of SVD, SIRT, and LSQR. Example 1: four-sided geometry The model for the first example (Fig. 1) is parameterized as a 4x4 discrete pixel array and so has 16 unknowns. Four sources and receivers are located along each of the four edges, one station in each pixel, giving 96 traveltime observations. We used the two-point ray-tracing method of Urn & Thurber (1987) to generate synthetic traveltime data, and save ray segment lengths in each pixel to form A in eq. (1); i.e. for convenience, initially we investigate only a pure linear problem. We assume a reference model of constant velocity (2.0 km s- ), and so reduce the traveltime observations to time residual data. Slowness perturbations are used to describe the model parameters, so the problem becomes solving the linear system (l), where x is the slowness perturbation vector (of length 16) to be solved, b is the traveltime residual data vector (of length 96), and A is a 96 X 16 rectangular matrix, each row of which consists of the distance segments traversed by a ray. After eq. (1) is solved, the obtained slowness perturbations are combined with the reference model to construct the updated velocity model. Figure 1 also shows solutions by SIRT, SVD, LSQR, and the proposed LSQRA methods from a starting model with constant velocity of 2.0 km s-i. Table 1 shows the singular values of matrix A included in the construction of the SVD and LSQRA solutions. LSQP-9 corresponds to the loss of orthogonality (when the projection of <the current Lanczos vector onto any previous one exceeds 1 per cent). At iteration 9, the smaller singular values in LSQRA are only approximate (compare with SVD-16), and a few others are still missing. By iteration 24, all 16 singular values are recovered by LSQRA. Using the LSQR or LSQRA eigenstructures that are the best fit to all the data at any iteration provides a better solution than those partial SVD eigenstructures for the same number of similar sized singular values. This is demonstrated by comparison of the SVD-16(9), LSQR-9 and LSQRA-9(9) solutions in Fig. 1. Figure 2 shows the resolution matrices computed using SVD and LSQRA. For comparison, we also show the normalized ray-density distribution because it helps to understand resolution and may, when the ray paths are independent, also be considered as a measure of resolution. The complete SVD and LSQRA solutions (with 16 singular values) show an identity diagonal matrix, confirming that all model parameters (slowness perturbations) are uniquely determined. The resolution matrices of SVD-16(9) and LSQRA-9(9) both show a clear diagonal trend with smaller positive and negative undulations in the off-diagonal positions. Still, the diagonal values for the two high and low anomalies in LSQRA-9(9) are equal to, or larger than, theif respective values in SVD-16(9), which is consistent with the solution results in Fig. 1. Figure 3 shows the sixth rows of the resolution matrices in Fig. 2, and the SIRT point spread function, f& the high-velocity pixel in Fig. 1. Each value in this row is plotted at its physical location in the model so that the spatial relations may be examined; this format is also used in displaying matrix rows in the other two examples below. From the definition (ir=rx=rf), each row of the resolution matrix relates one parameter to all the other parameters in the model. The less-than-ideal distribution for

412 J. Zhang and G. A. McMechan DISTANCE (KM) 4.0 0,4 SIRT-16 LSQR-9 LSQ R A-9 (9) LSQRA-24( 16) Figure 1. Example 1. The correct solution (upper left) has two velocity anomalies (2.5 and 1.5 km sc') superimposed on a constant-velocity background (2.0 km s I). Also shown are solutions by SIRT, SVD, LSQR, and the proposed LSQRA method. In each solution the number after the dash is the number of iterations performed for SIRT and LSQR (or singular values determined for SVD) and the number in parentheses is the number of singular values used to reconstruct the solution. See Table 1.

Large matrix inversions 413 Table 1. Singular values for the four-sided survey geometry. Dots in the first column indicate values not found by 1,SQRA in nine iterations (the second column). Dots in the third column indicate duplicate (and one very small) singular values not used in the construction of the LSQRA solution. SVD-16(9) LSQRA-9(9) 0.89702 0.89702 0.57391 0.55723 0.48356 0.42230 0.40080 0.38059 0.36200 0.34547 0.31 988 0.31911 0.30549 0.28829 0.27096 0.26674 0.25702 0.57391 0.55723 0.4 8 2 9 7 0.42006 0.38141 0.31862 0.27136 0.25629 LSQRA-24( 16) 0.89702 0.89702 0.89701 0.89701 0.57391 0.57391 0.55724 0.55724 0.48357 0.48356 0.42230 0.42193 0.40080 0.38058 0.36200 0.34547 0.31988 0.31911 0.30549 0.28829 0.27097 0.26674 0.25702 0.00000 SVD-16(9) and LSQRA-9(9) suggests that the model parameters, instead of being uniquely determined, are mutually dependent because of the incompleteness of the eigenstructures used (9 out of 16). The SIRT point spread response also does not reach unity at this point, but is a reasonable approximation. As the number of eigenvalues determined becomes more complete, the parameters become progressively better resolved. To examine the model error caused by data errors, we assumed a 0.004 s standard deviation in time residual data (a time picking error) and computed the covariance matrices. For ease of comparison with the velocity model, each covariance matrix of slowness perturbations was converted into a velocity error matrix. The diagonal values are the standard deviation in velocity (in kms-i) in each pixel of the model: for any row, the off-diagonal values show the correlation between the error in the pixel corresponding to the diagonal element, and the error in all the other pixels. When each value in a row is plotted at its physical location (as for the resolution matrices described above) we refer to it as an error correlation plot. Figures 4 and 5 show the full velocity error matrix and its sixth row, respectively. As in the solution and resolution, the model errors in the complete (16 singular values) solutions are virtually identical. Figs 2 and 4 (or 3 and 5) taken together, demonstrate the trade-off between resolution and model error; better resolution inevitably corresponds to larger model error (Jackson 1972). Example 2: cross-hole geometry The second example has the same velocity model as the first (Fig. l), but a cross-hole survey recording geometry is used. Four sources and four receivers are equally spaced along the two opposite vertical edges of the model, so the number of traveltime observations is 16, making the matrix A in eq. (1) of dimensions 16 X 16. This geometry gives a smaller angular aperture which reduces the ability to recover the model (McMechan 1983). This effect is manifested quantitatively by the larger condition number of A: in this example (1488.1) than in the previous example (3.5) (Tables 1 and 2). A larger condition number corresponds to greater sensitivity to data error or greater ill-posedness of the inverse problem (Lee & Stewart 1981). Figure 6 shows complete and partial SVD, LSQR and LSQRA solutions for the cross-hole experiment: Fig. 7 shows the corresponding resolution matrices and velocity error matrices. Loss of orthogonality occurs at iteration 8 in the LSQR and LSQRA solutions. Again, the full solutions are indistinguishable (compare SVD-16( 16) and LSQRA- 50(16) in Fig. 6); the partial solutions, and the resolution and model error estimates asymptotically approach those for the full solutions (Figs 6 and 7). The smallest singular values were removed from the SVD and LSQRA spectra (Table 2) to obtain SVD-16( 13) and &SQRA-20( 13) because their inclusion would generate unacceptably large velocity errors. Solutions, resolutions, and error matrices of SVD-16(13) and LSQRA-20(13) appear identical (Figs 6 and 7). Compared with the four-sided survey (Figs 4 and 5), the cross-hole survey tends to give a larger maximum velocity error for a comparable number of singular values of iterations, reflecting the effect of ray angular coverage and consequently greater ill-posedness of eq. (1). Example 3: large cross-hole model The model for the third example (Fig. 8) consists of a circular feature and a double-wedge. Sources and receivers are equally spaced in boreholes along the two vertical edges. The matrix A in eq. (1) now has dimensions of 40000 (observations) X 7500 (unknowns), which cannot be stored in CPU memory to apply the standard SVD algorithm. The LSQRA method provides a solution and the associated resolution and error estimates. As LSQR and LSQRA iterations proceed (Fig. 8) the details of the model (e.g. the boundaries between the highand low-velocity areas) improve, and the interior velocities become closer to the correct values, because more eigenstructures are included. LSQR and LSQRA convergefaster than SIRT, probably because of the more optimal use of conjugate updating directions (U and V) in LSQR (van der Sluis & van der Vorst 1987, 1990; Claerbout 1992). Figure 9 contains resolution and velocity error mbtrices for the LSQRA-41(32) solution. The diagonal trends in both resolution and velocity error matrices are clearly visible. Periodicity in both plots reflects the way in which the parameters of the 2-D model are arranged along the 1-D matrix rows (or columns). The presence of off-diagonal values in the resolution plot reveals mutual dependence among parameters. The relatively large off-diagonal values in the velocity error plot suggest that parameter errors

414 J. Zhang and G. A. McMechan 0 x E X e, 5 0 Y U W + w B a U a n Q

Large matrix inversions 415 U 2 R W 5 5.- 3 * 2 M 8.-

416 J. Zhang and G. A. McMechan PARAMETER INDEX 1 16 UI SVD-16( 16) SVD-16(9) LSQRA-9(9) LSQRA-24(16) -0.02 KM/S 0.06 Figure 4. The velocity error matrix. This is the signed square root of the absolute value of the covariance matrix after converting slowness to velocity. LSQRA-9(9) is a better approximation lo the full solutions (SVD-lh(l6) and LSQRA-24(16)) than SVD-16(9) is. The sixth rows of these matrices are plotted in Fig. 5.

Large matrix inversions 417 DISTANCE (KM) 0.4 P SVD-16( 16) SVD-16(9) LSQRA-9(9) LSQRA-24(16) -0.02 KMlS 0.06 Figure 5. The sixth rows of matrices in Fig. 4. These show the spatial relations (correlations) between the model errors in all other pixels, and that of the highest velocity pixel (the darkest one in each plot).

418 J. Zhang and G. A. McMechan Table 2. Singular values for the cross-hole geometry. Dots indicate small (first column) Or dupliate fthitd column) values that are omitted in the model reconstructions in Fig. 6. SVD16( 13) 0.49094 0.33597 0.27492 0.221 64 0.21547 0.20207 0.1 81 24 0.17417 0.98668E-01 0.75447E-01 0.51 330E-01 0.28262E-01 0.1 9069E-01 0 0.31522E-02 0.13010E-02 0.32991 E-03 LSQRA-8(8) 0.49094 0.33597 0.27491 0.21965 0.19644 0.17428 0.96353E-01 0.31827E-01 LSQRA-PO( 13) 0.49094 0.49094 0 0.49094 0.33597 0.33597 0.27492 0.27492 0.22236 0.22164 0.21547 0.20207 0.18124 0.17417 0.15307 0.98668E-01 0.75449E-01 0.51328E-01 0.28263E-01 0.190686-01 0.29031 E-02 correlate over long distances. Physically, this is attributable to data projection along ray paths inherent in the inversion. Figure 10 shows the diagonal elements of the resolution and velocity error matrices. The trade-off between resolution and error is manifested by higher resolution and larger errors as iterations proceed. Larger resolution values and velocity errors are concentrated near major velocity boundaries where ray densities are higher. Figure 11 shows the resolution and error correlation for two points in the model. In each resolution distribution, spreads of energy pass through the point being examined. Secondary patterns, related to the survey geometry and velocity distribution are also present. For both points, the velocity error also correlates with other parts of the model through ray crossings. It is interesting that the resolution has high amplitude in regions away from the point being considered, in contrast, for example, to the expected point spread function. The reason is that a point spread considers only first-order effects, and so, by definition, peaks at the point of interest; the corresponding resolution matrix takes into account all the higher order interactions. In general, each point influences all others. To complete the analysis of this example, Fig. 12 contains the singular value spectrum, and Fig. 13 contains every fourth eigenvector in the LSQRA-41(32) solution. The singular value spectrum smoothly decreases so there is no obvious discontinuity at which to truncate small values. The eigenvectors in Fig. 13 (columns of VQ in eq. (loa)) show a good correlation with structural features and boundaries in the solution (Fig. 8). Artefacts visible in each eigenvector tend to cancel upon superposition of the corresponding partial solutions. The level of detail increases as the magnitude of the singular value decreases; thus the lowwavenumber features of the solution are provided by the large singular values, and high-wavenumber features by the small singular values. DISCUSSION When a matrix is large and sparse, bidiagonalization via the Householder transformation, on which standard SVD algorithms are based, ceases to be computationally feasible and row-action algorithms like LSQR are necessary. None the less, the concepts of linear inverse theory can still be implemented numerically using the LSQR method in the same way as SVD. This is the essential interpretation from which the LSQRA method is extended to compute resolution and covariance matrices. If only a solution is required, LSQR by itself is sufficient. We used SVD to invert L so that the analogy between LSQRA and SVD emerges; singular values and orthonormal bases all result naturally. By using SVD on L instead of on A, the matrix dimension problem is transferred from CPU memory to hard disk. Obviously, the problem of limited CPU memory will arise again when the iteration number becomes too large. However, the number of iterations performed in a linear inversion is usually less than 10 ; for this size, inversion of L is tractable by SVD. Other methods such as Gaussian reduction can be used to invert L in eqs (7), (8) and (9), for L of larger dimensions. We found it numerically unstable to include duplicate singular values or use the Lanczos vectors in eq. (7) directly instead of the reorganized basis vectors VQ in eq. (11) to construct the solution, resolution, and covariance. By removing duplicate singular values and using the reorganized basis vectors, stability is achieved. There are other methods that keep Lanczos vectors orthogonal through iterations at the cost of increased algorithm complexity (e.g. Parlett & Scott 1979; Golub & Van Loan 1989). Strictly speaking, the parameter resdlution matrix R is defined as B*A instead of B*B. However, because A ru = VLT (eq. (2.4) of Paige, 1974), R = B*A = VL- UTA = VL- LV = VVT = B*B, which is eq. (7). We have numerically confirmed this relation. We did not study the information density matrix (Aki & Richards 1980; Lee & Stewart 1981) although it can also be computed by the LSQRA method (eqs 8 and 12), or by direct multiplication AB*. CONCLUSION Paige & Saunders (1982a,b) LSQR algorithm can be used to obtain singular values and orthonormal basis vectors and to produce a new (LSQRA) solution directly. Approximate model resolution and covariance matrices can also be constructed using the same eigenstructures. The number of LSQRA iterations defines the number of singular values and basis vectors used in the solution. Approximate resolution and covariance matrices are available at every iteration; with increasing iterations these asymptotically approach those for the full SVD solution. ACKNOWLEDGMENTS The research leading to this paper was funded by the NSF under grant EAR-9204610 and by the Sponsors of the UT

Large matrix inuersions 419 DISTANCE (KM) 8.0 0.4 0 0. b SVD-16( 1 6) SVD-16( 13) LSQ R A-8( 8) LSQRA-20(13) LSQR-50 LSQRA-50( 16) Figure 6. Example 2. The correct solution (upper left) is the same as in Fig. 1. The sources and receivers are equally spaced on the two vertical edges. Notation is the same as in Fig. 1.

420 J. Zhang and G. A. McMechan PARAMETER INDEX 16

E' Y W W 0 z a L Large matrix inversions 421

LO'O siuun f 00'0 O'tr S/WY,o 1 x 00 0's,Ol x S'O 0

DISTANCE (KM) 423 0.000 8 b KMlS 0 0.030 Figure 11. Resolution (above) and error correlation (below) for two representative locations (indicated by the arrow tips). Each of these corresponds to one row in the LSQRA-41132) matrices in Fig. 9. LSQRA-Il(32) I 03 2 0 16 32 singular value number Figure 12. Normalized singular value spectrum for the LSQRA-41(32) solution in Fig. 8. Numbers placed next to the larger singular values are the number of duplicates at those points.

424 J. Zhang and G. A. McMechan L - m a M 0 N 0 s m s L. P 0 c

Large matrix inversions 425 Dallas Geophysical Consortium. The authors acknowledge helpful comments from J. VanDecar, R. Snieder and an anonymous reviewer. Computations were performed on a Convex C-3 at the University of Texas at Dallas. The manuscript was expertly typed by Charlotte Stromer. Contribution No. 791 from the Program in Geosciences at the University of Texas at Dallas. REFERENCES Aki, K. & Richards, P., 1980. Quantitative Seismology, Vol. 2, Freeman. San Francisco. Berryman. J.G., 1994a. Rcsolution of iterative invcrscs in seismic tomography, in Proc. Cornelius Lanctos International Centenary Conference, pp. 297-299, eds Brown, J.D., Chu, M.T., Ellison, D.C. & Plemmons, R.J., SIAM, Philadelphia. Berryman, J.C., 1994b. Tomographic resolution without singular value decomposition, in Mathematical Methods in Geophysical Imaging 11, Proc. of SPIE, UOI, pp. 2-13, ed. Hassanzadeh, S., SPIE, Bellingham. Bjorck, A., 1988. A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations, BIT, 28, 659-670. Bjorck, A. & Eldh L.. 1979. Method in Numerical Algebra Ill-posed Problems, Technical Report LiTH-MATH-R-33-1979, Linkoping Univcrsity. Brzostowski, M.A. & McMechan, G.A., 1992. 3-D tomographic imaging of near-surface seismic velocity and attenuation, Geophysics, 57, 396-403. Claerbout, J.F., 1992. Earth Sounding Analysis: Processing Versus Inversion, Blackwcll Scientific Publications. Boston. Clayton, R.W. & Comer, R.P., 1984. A tomographic analysis of mantle heterogeneities, Terra Cognita, 4, 282-283. Dongarra, J., Bunch, J.R., Moler, C.F. & Stewart, G.W., 1978. LINPACK Users Guide, SIAM Publications, Philadelphia. Golub, G.H. & Kahan, W., 1965. Calculating the singular values and pseudo-inverse of a matrix, SIAM J. Numer. Anal., 2, 205-224. Golub, G.H. & Van Loan, C.F., 1989. Matrix Computation, 2nd edn, Johns Hopkins University, Baltimore. Hestenes, M.R. & Steifel, E., 1952. Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand., 49, 409-436. Humphreys, E. & Clayton, R.W., 1988. Adaptation of back projection tomography to seismic traveltime problems, J. geophys. Res., 93, 1073-1085. Jackson, D.D., 1972. Interpretation of inaccurate, insufficient, and inconsistent data, Geophys. 1. R. astr. Suc., 28, 97-109. Lanczos, C., 1961. Linear Differential Operators, Chap. 3, Van Nostrand, London. Lee, W.H.K. & Stewart, S.W., 1981. Principles and Applications of Microearthquake Networks, Academic Press, New York. Lees, J.M. & Crosson, R.S., 1990. Tomographic imaging of local earthquake delay times for three-dimensional velocity variation in western Washington, J. geophys. Res., 95,4763-4776. Leiss, E.L. & Pan, J.-M., 1991. Inverse techniques in geophysical tomography: a comparison of noisy data, in Expanded Abstracts, 61st Annual lnt. Meeting, pp. 732-736. Society of Exploration Geophysicists, Houston. McMechan, G.A., 1983. Seismic tomography in boreholes, Geophys. J. R. astr. Soc., 74, 601-612. Menke, W., 1984. Geophysical Data Analysis: Discrete Inverse Theory, Academic Press, New York. Meyerholtz, K.A., Palvis, G.L. & Szpakowski, S.A., 1989. Convolutional quelling in seismic tomography, Geophysics, 54, 570-580. Nakanishi, 1. & Suetsugu, D., 1986. Resolution matrix calculated by a tomographic inversion method, J. Phys. Earth, 34, 95-99. Nolet, G., 1985. Solving and resolving inadequate and noise tomographic systems, J. Comput. Phys., 61,463-482. Nolet, G., 1987. Seismic wave propagation and seismic tomography, in Seismic Tomography, pp. 1-23, ed. Nolet, G., Reidel, Dordrecht. Nolet, G. & Snieder, R., 1990. Solving large linear inverse problems by projection, Geophys. J. Int., 103, 565-568. Paige, C.C., 1974. Bidiagonalization of matrices and solution of linear equations, SIAM J. Numer. Anal., 11, 197-209. Paige, C.C. & Saunders, M.A., 1982a. LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Software, 8, 43-71. Paige, C.C. & Saunders, M.A., 1982b. Algorithm 583 LSQR: sparse linear equations and least square problems, ACM Trans. Math. Sofmare, 8,195-209. Parlett, B.N. & Scott, D.S., 1979. The Lanczos algorithm with selective orthogonalization, Math. Comput., 33, 217-238. Scales, J.A., 1989. On the use of conjugate gradient to calculate the eigenvalues and singular values of large, sparse matrices, Geophys. J., 97, 179-183. Spakman, W. & Nolet, G., 1988. Imaging algorithms, accuracy and resolution in delay time tomography, in Mathematical Geophysics, pp. 155-187, eds Vlaar, N.J., Nolet, G., Wortel, M.J.R. & Cloetingh, S.A.P.L., Reidel, Dordrecht. Trampert, J. & Leveque, J., 1990. Simultaneous iterative reconstruction technique: physical interpretation based on the generalized least-squares sblution, J. geophys. Res., 95, 12553-12 559. Um, J. & Thurber, C., 1987. A fast algorithm for two-point seismic ray tracing, Bull. seism. Soc. Am., 77, 972-986. van der Sluis, A. & van der Vorst, H.A., 1987. Numerical solution of large, sparse linear algebraic systems arising from tomographic problems, in Seismic Tomography, pp. 48-83, ed. Nolet, G., Reidel, Dordrecht. van der Sluis, A. & van der Vorst, H.A., 1980. SIRT- and CG-type methods for iterative solution of sparse linear least-squares problems, Linear Algebra and its Applications, 130, 257-303. Wiggins, R.A., 1972. The general linear inverse problem: implication of surface waves and free oscillations for earth structure, Rev. Geophys. Space Phys., 10,251-285. APPENDIX A The derivation of Paige & Saunders (1982a) LSQR algorithm consists of two parts; first, bidiagonalization of matrix A (Golub & Kahan 1965; Paige 1974) and second, QR decomposition of the lower bidiagonal matrix L (Paige & Saunders 1982a). Numerical implementation of the LSQR algorithm is given in detail by Paige & Saunders (1982a,b). Here we sketch their procedure to show that a new numerical solution (LSQRA) implied by their procedure, is identical to that of multiplying the inverse B* by the observation vector b. Following Paige & Saunders (1982a), suppose the minimum least-squares solution has the form I x = vy, ( 41) where V is the n X k orthonormal matrix in the model domain, y is a vector of length k, and x is the solution vector of length n. Then, the problem of minimizing b- Ax becomes min IIPle, - Lyll,, (A21

426 J. Zhang and G. A. McMechan where L is the k X k lower bidiagonal matrix obtained using LSQR, Dl is a scalar equal to the norm of b, el is a unit vector equal to the first column of a k X k identity matrix, and Plel = UTb, ('43) where U is the m X k orthonormal matrix in the data domain, and UT is the transpose of U. Applying the QR decomposition to the lower bidiagonal matrix L and vector Pie, (Golub & Van Loan 1989), we obtain QL=R (A4) where R is an upper diagonal matrix, Q is the transforming orthogonal matrix, and c is the transformed data vector. Matrices Q, L and R are full-rank square matrices. Now, the minimization problem (A2) translates into finding y from Ry = c. ('46) Because R is full rank, we obtain y in eq. (A6) and a solution to the minimization problem (A2). The ingenuity of Paige & Saunders' LSQR algorithm lies in their numerical procedure that combines the matrix bidiagonalization and the finding of the solution in a single iterative process requiring minimal CPU memory. In analytical form, the solution can be expressed by applying the inverse B* to b. By combining eqs (Al)-(A6), the minimum least-squares solution given by LSQR becomes %=Vy = VR-'c = VL-'Q-lQ(p,e,), where R-', L-I and Q-' are the inverses of R, L and 0, respectively, and Q satisfies 0Q-l = Q-'Q = 1. With eq. (A3) and Q-'Q = I, eq. (A7) becomes f = VL-'(P,e,) = VL-'UTb. (A8) Eq. (A8) gives the numerical solution of the LSQR algorithm by an analytical formula, which provides the mathematical basis for forming eqs (6a) and (6b). This formulation has apparently not been used before, probably because it still involves matrix multiplication; these, matrices are, however, much smaller than those of the original problems being solved. Because of eqs (6a) and (6b), the following relations hold: BB*B = (UUT)ULVT = U(UTU)LVT = ULVT = B, B*BB* = (WT)VL-'UT = v(vtv)l-'u' = VL-'UT = B*, (BB*)~ = (uut)' = (UT)TUT and = UUT = BB* (B*B)~ = (vvt)' = (VT)TVT = WT = B*B. Therefore, B and B* also satisfy the Moore-Penrose conditions.