next up [*] print clean
Next: About this document ... Up: Table of Contents

[1] [*] [1] ([*]) [1] [*] [1] [*] [1] [*] [1] [*] [1]   [1]   [1]   [1]   [1]   et al. [1] [*] Proof:  
\begin{eqnarray}
{
\enq{\end{eqnarray} (1)
x x a b c d e E g h j J m n p ^ q r ^ s t u v w x . y z ^ ^ ^ ^ ^ 1_n 1_m A B C D F G H I J k K L M N O 0 P Q R S T T U V W X Y Z R 0&^T&0 &00& A C D E F G K L P R

S T V W min LS [1]LS[#1] [1]LS[#1] Int Bdy k_- k_+ k_s i(r - t) i(r - t) i(z - t)

Computing tomographic resolution matrices using Arnoldi's iterative inversion algorithm

James G. Berryman

berryman@sep.stanford.edu

ABSTRACT

Resolution matrices are useful in seismic tomography because they allow us to evaluate the information content of reconstructed images. Techniques based on the multiplicity of equivalent exact formulas that may be used to define the resolution matrices have been used previously by the author to design algorithms that avoid the need for any singular value decomposition of the ray-path matrix. An explicit procedure is presented for computing both model and data resolution matrices using Arnoldi's algorithm for iterative inversion in seismic tomography. Arnoldi's method differs from the Lanczos scheme by including explicit reorthogonalization of basis vectors. Some convenient notation is introduced to permit ready comparison of Arnoldi's method with the Lanczos approach. Arnoldi's method requires greater storage of basis vectors but completely overcomes the lack of basis vector orthogonality, which is the major practical limitation of the Lanczos method.

INTRODUCTION

In an earlier contribution to SEP-77 (Berryman, 1993), I discussed general methods of constructing resolution matrices for iterative inverses and applied these techniques to both Lanczos and Paige-Saunders (LSQR) algorithms (Lanczos, 1950; Paige and Saunders, 1982). The comment was made that those methods could be applied to a wide variety of iterative inversion methods. To illustrate the general utility of the methods, I show how the methods can be applied to Arnoldi's algorithm for iterative inversion. This algorithm may be viewed as a variant of the Lanczos algorithm, but with reorthogonalization of the basis vectors explicitly built in. I will first review the Lanczos method and then present the Arnoldi approach using a similar notation. Hopefully the similarities and differences of the two techniques will then become clear.

REVIEW OF RESOLUTION MATRICES

Discussions of resolution played a central role in the classic paper of Aki et al. (1977) on large scale seismic tomography. Such discussions continue to play a key role in the interpretation of inversion results in seismology. For example, in their recent review article -- which also includes an extensive discussion of resolution in seismic inverse problems -- Evans and Achauer (1993) state that ``$\ldots$ evaluating the whole resolution matrix, not just its diagonal elements, is a required part of interpreting [the reconstructed model].'' Indeed, any inverse method based on incomplete or imperfect data should be expected to produce a filtered or blurred image of the object or region of interest. Apparent resolution is therefore an important figure of merit for practical imaging and inversion schemes, since the user of such schemes will eventually want to know what the smallest object is that can be distinguished. Judgments concerning reliability of features observed in the reconstructed image depend strongly on our understanding of inherent resolution capabilities of the inverse method used to produce the image. Therefore, a convenient quantitative measure of pointwise or cellwise model reliability is certainly helpful and perhaps essential for subsequent interpretation.

For completeness, I will review the main ideas about resolution matrices. For linear inversion problems, resolution matrices can be understood most easily by considering a matrix equation of the form

= ,   and first asking the question: Given matrix and data vector ,what model vector solves this equation? When the matrix is square and invertible, the answer to the question is relatively easy: $\vecs = \matM^{-1}\vect$,with $\matM^{-1}$ being the usual matrix inverse of .However, it often happens in geophysical inversion problems that is not square, or not invertible even if it is square. In these situations, the least-squares method is often used, resulting in the normal equations

^T= ^T,   which can often be solved approximately for since the normal matrix $\matM^T\matM$ is square and symmetric -- although it may still be singular. It proves convenient now to introduce an approximate inverse $\matM^\dagger$ called the Moore-Penrose pseudoinverse (Moore, 1920; Penrose, 1955a). This generalized inverse is the unique matrix that satisfies the four conditions: $\matM\matM^\dagger\matM = \matM$, $\matM^\dagger\matM\matM^\dagger
= \matM^\dagger$, $\matM^\dagger\matM = (\matM^\dagger\matM)^T$,and $\matM\matM^\dagger = (\matM\matM^\dagger)^T$.Although other choices for the approximate inverse are known [for example, see Rao (1965)], I will restrict discussion here to this best known approximate inverse. Then, after multiplying (Mst) on the left by $\matM^\dagger$,the result is

^= ^.   If it were true that $\matM^\dagger\matM = \matI$ (the identity matrix), then I would have solved the inversion problem exactly, and also have perfect model resolution. But it is precisely in those problems for which no such inverse exists that the following analysis is needed. Define the matrix coefficient of in (MMsMt) to be the resolution matrix

^.   The deviations of from the identity matrix , i.e., the components of the difference matrix $\matI - \calR$, determine the degree of distrust appropriate for the components of the solution vector that are most poorly resolved.


  
Figure 1: Schematic illustration of ray paths through a slowness model with rectangular cells.
\begin{figure}
\begin{center}

\setlength {\unitlength}{2.2cm}
 
\begin{picture}...
 ...laystyle{t_i = \sum_{j=1}^{16} l_{ij}s_j}$}\end{picture}\end{center}\end{figure}

For definiteness, consider the seismic tomography problem (see Figure 1): is an $m\times n$ ray-path matrix, is a data m-vector of first arrival traveltimes, and is the model n-vector for (possibly rectangular) cells of constant slowness (inverse velocity). I seek the slownesses given the measured traveltimes in and the estimates of the ray paths between source and receiver locations contained in the matrix [see Berryman (1990) and references therein]. Then, the resolution matrix defined in (RMM) is the model resolution, since the slowness vector is the desired model of acoustic wave slowness. A data resolution matrix may also be defined. First, multiply (MMsMt) on the left by so

^= = ^,   and then compare (MMMsMMt) to (Mst), noting that the matrix product multiplying should equal the identity matrix if the approximate inverse $\matM^\dagger$ is a true inverse. Again, deviations of this matrix from the identity provide information about the degree to which the computed solution makes use of all the data in . Thus, the data resolution matrix (Jackson, 1972; Wiggins, 1972) is defined by

_data ^,   while the resolution matrix defined previously in (RMM) is the model resolution (Backus and Gilbert, 1968; Jackson, 1972)

_model ^.   Furthermore, for seismic inversion, I must also consider mathematial nonlinearities involved in the process of finding a ray-path matrix that is consistent with the model .For the present purposes, assume that and are the final (and mutually consistent) products of an iterative algorithm (Berryman, 1990). Then, the question of resolution needs to be studied carefully in order to explore fully the range of possible solutions resulting from inherent nonuniqueness of the inverse problem.

The significance of these two resolution matrices may be understood by considering the singular value decomposition (SVD) of the matrix ,given by

= _i=1^r _i _i_i^T,   where the m-vectors $\vecu_i$ and n-vectors $\vecv_i$ are the eigenvectors of determined by $\matM\vecv_i = \lambda_i\vecu_i$ and $\vecu_i^T\matM = \lambda_i\vecv_i^T$ and the $\lambda_i$s are the eigenvalues. The eigenvectors are also assumed to satisfy orthonormality conditions $\vecu_i^T\vecu_j = \delta_{ij}$ and $\vecv_i^T\vecv_j = \delta_{ij}$.The rank of (the number of nonzero eigenvalues) has a value $r \le \min(m,n)$. The Moore-Penrose pseudoinverse is then known to be given by

^= _i=1^r _i^-1_i_i^T,   so the resolution matrices are written explicitly in terms of sums of the outer products of the eigenvectors as

_model = _i=1^r _i_i^T   and

_data = _i=1^r _i_i^T.   When displayed in this form, it is clear that the resolution matrices simply express the completeness of the resolved model or data spaces respectively. They are projection operators on the span of the resolved parts of the model and data vector spaces.

COMPUTING RESOLUTION

Now it is important to recognize that, although the resolution matrices have generally been defined by equations (Rd) and (Rm) -- or by (vv) and (uu) (which implicitly assume that a singular value decomposition has been performed), it may nevertheless be possible to compute these matrices in other ways. Of particular importance is the computation of an effective inverse matrix generated by an iterative inversion procedure.

To establish the plausibility of computing resolution without singular value decomposition, first consider a simple pedagogical example that would not be ideal for computations. For convenience define $\eta$ to be a parameter having the significance of a continuous iteration number and let $\matX(\eta)$ be the current approximation to the pseudoinverse $\matM^\dagger$.Then the current value of the approximate solution vector is given by $\vecs(\eta) = \matX(\eta)\vect$ and the unresolved part of the data vector is clearly given by the difference vector $\Delta\vect = \vect - \matM\matX(\eta)\vect$. The length of this vector is a scalar measure of the unresolved portion of the data. By designing the iterative inversion scheme to decrease the length of this vector progressively as $\eta\to \infty$, the derivative of its square with respect to the continuous iteration number $\eta$ is given by

^T(-^T^T)(-) =                     - ^T^T^T(-)- ^T(-^T^T).   A sufficient condition for the traveltime data resolution to improve continuously as $\eta\to \infty$ is then [see Lu and Berryman (1990)] the equation of motion for given by

= ^T(-),   where $\gamma\gt 0$ is some arbitrary scalar that determines the rate of convergence. It follows by construction that the right hand side of (firstderivative) is always negative or zero. Thus, the nonnegative length $\vert\vect-\matM\matX(\eta)\vect\vert$ is a continuously decreasing function of the iteration parameter $\eta$ as long as $\matX(\eta)$ has the equation of motion given by (eom). Clearly, the right hand side of (eom) vanishes if and only if the approximate inverse matrix satisfies

^T() = ^T,   which is equivalent to the normal equations of least-squares, so $\matX(\infty) = \matM^\dagger$ as expected (Penrose, 1955b).

Defining effective resolution matrices $\calE_{model}(\eta) \equiv \matX(\eta)\matM$and $\calE_{data}(\eta) \equiv \matM\matX(\eta)$,then it follows from (eom) that

_model = ^T(-_model)   and

_data = ^T(-_data).   Assuming initial conditions for the integration are given by $\calE_{model}(0) = \matO$ and $\calE_{data}(0) = \matO$, it is easy to see that the solutions of these equations will be symmetric matrices as desired. This approach becomes an iterative procedure when the equations are solved numerically by discretizing the independent parameter $\eta$ and stepping from one discrete value of $\eta$ to the next.

The procedure just outlined establishes that iterative procedures for computing resolution matrices are certainly possible. However, this particular method is less than ideal because it would be computationally intensive. So next consider a general procedure that could be applied to many iterative methods that are used in practice.

To find a more efficient approach that is also inherently symmetric, first analyze the SVD of the normal matrix, i.e.,

^T= _i=1^r _i_i_i^T_j=1^r _j _j_j^T = _i=1^r _i^2_i_i^T.   Then, it is easily shown that

(^T)= _i=1^r _i_i^T = _model,   and similarly that an alternative is given by

^T(^T)^= _i=1^r _i_i^T = _model,   a formula that is automatically symmetric. The data resolution is given by the various alternative forms

^= ^T(^T)^= (^T) = _j=1^r _j_j^T = _data.  

Iterative methods such as the method of Lanczos (1950) for solving (Mst) in the least-squares sense often compute the pseudoinverse -- not of itself but rather -- of the normal matrix $\matM^T\matM$. The appearance of the pseudoinverse $(\matM^T\matM)^\dagger$ of the normal matrix in these alternative expressions for the resolution matrices (Rmagain)-(Rdagain) then provides both a motivation and a clue to constructing methods for computing resolution with such iterative methods. Thus, in the following discussion, I am able to show how these various equivalent formulas for the resolution matrices may be used efficiently, in the course of a routine computation whose primary goal is to find an approximate solution to by iterative methods.

TRIDIAGONALIZATION METHOD OF LANCZOS

Consider the linear inversion problem of the form ,where is the unknown. For a crosshole seismic tomography problem (Berryman, 1990), is an $m\times n$ ray-path matrix, is an m-vector of first arrival traveltimes, and is the model slowness (inverse velocity) vector. The method of Lanczos (1950) solves this problem by introducing a sequence of orthonormal vectors $\vecz^{(k)}$through a process of tridiagonalization. To obtain the minimum norm least-squares solution of , the method may be applied to the normal equations (normaleqns), since they have the form with square symmetric matrix $\matA = \matM^T\matM$, unknown vector , and data vector $\vecb = \matM^T\vect$.

Lanczos algorithm

With $\matA = \matM^T\matM$ and $\vecb = \matM^T\vect$,the Lanczos algorithm is a projection procedure equivalent to the following:

^(1)(^(1))^T= ,  

[^(1)(^(1))^T+^(2)(^(2) )^T]^(1) = ^(1),   and, for $k \ge 2$,

[^(k-1)(^(k-1))^T+^(k)(^(k) )^T+^(k+1)(^(k+1))^T] ^(k) = ^(k).   By simple induction on the recursion formulas, all the basis vectors are ideally orthonormal and therefore satisfy $\left(\vecz^{(i)}\right)^T\vecz^{(j)} = \delta_{ij}$. Of course, these conditions are not satisfied exactly for finite precision calculations -- leading to some practical difficulties that will not be discussed here for lack of space (Parlett, 1980).

For the application to traveltime tomography, the pertinent constants are defined by $N_1 = \left\vert\matM^T\vect\right\vert = \left(\vecz^{(1)}\right)^T\matM^T\vect$,$D_k = \left(\vecz^{(k)}\right)^T\matM^T\matM\vecz^{(k)}\quad\hbox{for}\quad
k = 1,2,\ldots$,and $N_{k+1} = \left(\vecz^{(k+1)}\right)^T\matM^T\matM\vecz^{(k)}\quad\hbox{for}
\quad k = 1,2,\ldots$. The equations (lanczos1)-(lanczos3) determine a tridiagonal system of the form

^(k+1)_k^TN_k+1+ _k_k = ^T_k   for  2kr,   where the unit vector $\vece_k$ is all zeroes except for a one in the kth position. The tridiagonal matrix of coefficients is defined by

_k = D_1 & N_2 & & & N_2 & D_2 & N_3 & & & N_3 & D_3 & N_4 & & & & & & & & N_k & D_k   for  2k r,   and where the matrix $\matZ_k = \pmatrix{\vecz^{(1)}&\vecz^{(2)}&\vecz^{(3)}&\ldots&\vecz^{(k)}\cr}$is composed of the resulting orthonormal vectors. In practical implementations of the algorithm, the constants Nk+1 are generally found through the normalization condition implicit in (lanczos3).

Assuming infinite precision, the process stops when k=r (the rank of the matrix) because then $N_{r+1} \equiv 0$ (or is numerically negligible). It follows from (tridiagonal) that this tridiagonalization process results in the identity

^T= _r_r_r^T.   Since $\matT_r$ is invertible, the Moore-Penrose inverse (Penrose, 1955a) of the normal matrix is given by

(^T)^= _r(_r)^-1_r^T.  

The solution to the least-squares inversion problem may therefore be written as $\vecs = \matZ_r\left(\matT_r\right)^{-1}\matZ_r^T\matM^T\vect =
N_1\matZ_r\left(\matT_r\right)^{-1}\vece_1$.The first column of the inverse of $\matT_r$ (the only one needed) may be found using an elementary recursion relation.

Lanczos resolution

Since the Lanczos algorithm directly produces a sequence of orthonormal vectors in the model space, it is straightforward to see that the model resolution matrix for this method is

_model = ^T(^T)^= _r_r^T,   which is also clearly symmetric. I compute the data resolution matrix using the fact that $\calR_{data} = \matM\matM^\dagger = \matM\left(\matM^T\matM\right)^\dagger
\matM^T$ together with (Adagger), so that

_data = _r(_r)^-1_r^T^T.   Both resolution matrices are symmetric if the full Lanczos inverse is computed.

Assuming only that orthogonality has been maintained among the vectors $\vecz^{(k)}$ [a nontrivial assumption for realistic applications (Parlett, 1980)], the matrix $\matZ_r$ is just the span of the model space associated with the ray-path matrix . Thus, (modelreslanczos) is understood intuitively as the result one obtains by finding a new orthonormal set of basis vectors for the span of the resolution matrix. In other methods such as the Paige-Saunders LSQR algorithm (Paige and Saunders, 1982), it is also possible to obtain a corresponding formula for the data resolution. However, for the Lanczos algorithm, the data resolution must be computed indirectly from the available information using (datareslanczos).

ARNOLDI INVERSION WITH REORTHOGONALIZATION

Once again, consider the linear inversion problem of the form , where is the unknown. For a crosshole seismic tomography problem (Berryman, 1990), is an $m\times n$ ray-path matrix, is an m-vector of first arrival traveltimes, and is the model slowness (inverse velocity) vector. The method of Arnoldi (1951) solves this problem by introducing a sequence of orthonormal vectors $\vecv^{(k)}$through a process similar to Lanczos tridiagonalization, but with all vectors $\vecv^{(k)}$ reorthogonalized against all previous $\vecv^{(i)}$ for i < k. To obtain the minimum norm least-squares solution of , the method may be applied to the normal equations (normaleqns), since they have the form with square symmetric matrix $\matA = \matM^T\matM$, unknown vector , and data vector $\vecb = \matM^T\vect$.

Arnoldi algorithm

With $\matA = \matM^T\matM$ and $\vecb = \matM^T\vect$,the Arnoldi algorithm is a projection procedure equivalent to the following:

^(1)(^(1))^T= ,  

[^(1)(^(1))^T+^(2)(^(2) )^T]^(1) = ^(1),   and, for $k \ge 2$,

[_i=1^k+1^(i)(^(i))^T] ^(k) = ^(k).   By construction, all the basis vectors satisfy $\left(\vecv^{(i)}\right)^T\vecv^{(j)} = \delta_{ij}$,so to numerical accuracy the vectors are orthonormal and furthermore they would each be identical to the corresponding vectors appearing in the Lanczos process if the computations could be performed with infinite precision. Results would be identical to the results of Lanczos method in this idealized situation. In fact, (arnoldi1) and (arnoldi2) are identical to (lanczos1) and (lanczos2), while (arnoldi3) is identical to (lanczos3) for k=2. Differences only begin to arise for $k \ge 3$.

For the application to traveltime tomography, the pertinent constants are defined by $N_1 = \left\vert\matM^T\vect\right\vert = \left(\vecv^{(1)}\right)^T\matM^T\vect$,$d_k = \left(\vecv^{(k)}\right)^T\matM^T\matM\vecv^{(k)}\quad\hbox{for}\quad
k = 1,2,\ldots$,and $n_{i,j} = \left(\vecv^{(i)}\right)^T\matM^T\matM\vecv^{(j)}\quad\hbox{for}
\quad i = 1,2,\ldots,k+1$ and $j = i-1,i+1,\ldots,k$ (except that $j\ne 0$). Note that N1 is exactly the same as the constant of the same name in the Lanczos algorithm. The equations (arnoldi1)-(arnoldi3) determine a system of the form

_k+1_k+1 = ^T_k   for  2kr,   where the upper Hessenberg matrix ($(k+1)\times k$) of coefficients is defined by

_k+1 = d_1 & n_1,2 & n_1,3 & n_1,4 & & n_1,k n_2,1 & d_2 & n_2,3 & n_2,4 & & n_2,k & n_3,2 & d_3 & n_3,4 & & & & n_4,3 & & & & & & & d_k-1 & n_k-1,k & & & & n_k,k-1 & d_k & & & & & n_k+1,k   for  2k r,   and where the matrix $\matV_k = \pmatrix{\vecv^{(1)}&\vecv^{(2)}&\vecv^{(3)}&\ldots&\vecv^{(k)}\cr}$is composed of the resulting orthonormal vectors.

Assuming infinite precision, the process stops when k=r (the rank of the matrix) because then $n_{r+1,r} \equiv 0$ (or is numerically negligible). Thus, the bottom row of the Hessenberg matrix $\matH_{r+1}$ is full of zeroes, and therefore may be truncated to create the square ($r\times r$) matrix $\overline{\matH}_{r}$ which contains only the first r rows of $\matH_{r+1}$.Then, it follows from (hessenberg) that this process results in the identity

^T= _r_r_r^T.   Since $\overline{\matH}_{r}$ is invertible, the Moore-Penrose inverse (Penrose, 1955a) of the normal matrix is given by

(^T)^= _r(_r)^-1 _r^T.  

The solution to the least-squares inversion problem may therefore be written as $\vecs = \matV_r\left(\overline{\matH}_r\right)^{-1}\matV_r^T\matM^T\vect =
N_1\matV_r\left(\overline{\matH}_r\right)^{-1}\vece_1$.The first column of the inverse of $\overline{\matH}_{r}$ is the only one needed and may be found using an elementary recursion relation.

Arnoldi resolution

Like the Lanczos algorithm, the Arnoldi algorithm produces a sequence of orthonormal vectors in the model space. It is therefore straightforward to see that the model resolution matrix for this method is

_model = ^T(^T)^= _r_r^T,   which is also clearly symmetric. In analogy with the results for the Lanczos algorithm, I can again compute the data resolution matrix using the fact that $\calR_{data} = \matM\matM^\dagger = \matM\left(\matM^T\matM\right)^\dagger
\matM^T$ together with (AdaggerH), so

_data = _r(_r)^-1_r^T^T.   Both resolution matrices are symmetric if the full Arnoldi inverse is computed.

By design, orthogonality is maintained among the vectors $\vecv^{(k)}$,so the matrix $\matV_r$ is just the span of the model space associated with the ray-path matrix . Thus, (modelresarnoldi) is understood intuitively as the result one obtains by finding a new orthonormal set of basis vectors for the span of the resolution matrix. However, for the Arnoldi algorithm as for the Lanczos algorithm, the data resolution must be computed indirectly from the available information using (dataresarnoldi).

DISCUSSION

If computations could be performed with infinite precision, the Lanczos and Arnoldi algorithms would be the same. To see that this is so, consider the definition of Arnoldi matrix element ni,j:

n_i,j (^(i))^T^T^(j).   Since the normal matrix $\matM^T\matM$ is symmetric, it follows immediately that ni,j = nj,i, at least in principle. For example, the Lanczos algorithm computes N2 = n2,1 and then uses the identity based on the symmetric nature of the normal matrix to make the substitution n1,2 = N2. The remaining matrix elements in the first row of the Hessenberg matrix (hessenbergmatrix) then vanish $n_{1,3} = n_{1,4} = \cdots = n_{1,k} = 0$,again in principle. In practice, these elements do not quite vanish to working precision. By retaining these nonzero matrix elements in the algorithm, Arnoldi provides an algorithmically efficient method of dealing with the nonorthogonality of the basis vectors that would be generated by using the pure Lanczos method without any form of reorthogonalization. The price that must be paid for this improvement is the increase in storage needed to accommodate all the basis vectors, which must be retained to compute the far off-diagonal matrix elements used in the reorthogonalization.

ACKNOWLEDGMENTS

I thank H. Tal-Ezer for introducing me to the Arnoldi algorithm.

REFERENCES



 
next up [*] print clean
Next: About this document ... Up: Table of Contents
Stanford Exploration Project
5/11/2001