Solving -- methods and problems

Next: Prediction-error or annihilation filters Up: Background and definitions Previous: Least-squares solutions to inverse

Solving $(\bf{A}^{\dagger} \bf{A})^{-1}$ -- methods and problems

In geophysical filtering applications, the matrix $\st A^{\dagger} \st A$ is often, although not always, Toeplitz. (A Toeplitz matrix is a matrix in which each row is a shifted version of the others and has constant diagonals). In general, if the type of filter application is transient convolution, convolution with the data sequence padded with zeros, $\st A^{\dagger} \st A$ is Toeplitz, but if the filter application is internal (unpadded convolution) or truncated-transient (transient convolution truncated to the input length), $\st A^{\dagger} \st A$ is no longer in a Toeplitz formClaerbout (1995). When $\st A^{\dagger} \st A$ is Toeplitz, the matrix can be represented by a single row of the matrix, since a Toeplitz matrix is made up of a single row shifted so that a particular element is always on the diagonal. This makes the matrix easy to store in a computer memory, and easy to solve, since an efficient algorithm for solving Toeplitz systems, that of Levinson recursion, existsKanasewich (1981).

In the cases where $\st A^{\dagger} \st A$ is not Toeplitz, it is still symmetric, and this symmetry can be useful in calculating $(\st A^{\dagger} \st A)^{-1}$ .Another useful feature of $\st A^{\dagger} \st A$ is that it is generally positive definite. Although $\st A^{\dagger} \st A$ is guaranteed to only be positive semidefinite, in practice a small stabilizer $\epsilon \st I$ is added to $\st A^{\dagger} \st A$ to force it to be positive definite. The idea of adding $\epsilon \st I$ can be viewed as forcing all the eigenvalues to be greater than $\epsilon$ and eliminating any zero eigenvalues. That $\st A^{\dagger} \st A$ is positive definite can be seen from the definition of a positive definite matrix, where $\st B$ is positive definite if $\sv x^{\dagger} \st B \sv x$ is always positiveStrang (1986). If $\st B = \st A^{\dagger} \st A$ and $\st A$ is full rank, then $\sv x^{\dagger} \st A^{\dagger} \st A \sv x = (\st A \sv x)^{\dagger} (\st A\sv x)$ , which is always positive for non-zero $\sv x$ .Since $\st A^{\dagger} \st A$ is positive definite, it can be solved by Cholesky factorizationStrang (1988). Cholesky factorization allows a positive definite matrix $\st B$ to be factored as $\st B = \st L \st L^{\dagger}$ , where $\st L$ is a lower triangular matrix and $\st L^{\dagger}$ is the corresponding upper triangular matrix. The cost of solving a system with Cholesky factorization is about half that of a general linear system solver, and the method is generally accurate (Makhoul 1975 in Childers 1978). Efficient methods for Cholesky factorization to solve such systems are given in LINPACKDongarra et al. (1979) and in Tarantola 1987.

For a Toeplitz matrix, the storage requirements are small, since only one row of the matrix needs to be stored. In the more general case, where $\st A^{\dagger} \st A$ is symmetric positive definite, half the matrix needs to be stored for Cholesky factorization. When the inversion requires a fairly small matrix $\st A$ ,Levinson recursion or Cholesky factorization methods are sufficient. Later in this thesis, larger problems than those that can be practically solved with these methods will be addressed, but it can be seen that even for filter calculations, these computations may become difficult because of their size. For example, when calculating a one-dimensional filter, the number of elements in the matrix $\st A^{\dagger} \st A$ is required to be the square of the number of elements in the filter. If the type of filter application is transient convolution, only one column of $\st A^{\dagger} \st A$ needs to be stored, but if the filter application is internal or truncated-transient convolution, $\st A^{\dagger} \st A$ is no longer Toeplitz and half of the elements in the matrix need to be stored. For a short one-dimensional filter, this is generally a reasonable requirement.

For two- or three-dimensional filters, the filters could be calculated by getting the inverse of $\st A^{\dagger} \st A$ ,but the size of this matrix will tend to be very large. Consider, for example, a small two-dimensional problem, where the filter is

$\begin{displaymath} \left( \begin{array} {cc} f_{11} & f_{12} \\ f_{21} & f_{22} \end{array} \right),\end{displaymath}$ (22)

and the data with which the filter will be convolved is

$\begin{displaymath} \left( \begin{array} {cccc} x_{11} & x_{12} & x_{13} & x_{1... ...{34} \\ x_{41} & x_{42} & x_{43} & x_{44} \end{array} \right).\end{displaymath}$ (23)

By unwrapping and padding the data into a one-dimensional vector,

$\begin{displaymath} \left( \begin{array} {c} x_{11} \\ x_{21} \\ x_{31} \\ x... ...\ x_{14} \\ x_{24} \\ x_{34} \\ x_{44} \end{array} \right),\end{displaymath}$ (24)

then unwrapping and padding the filter into a one-dimensional vector,

$\begin{displaymath} \left( \begin{array} {c} f_{11} \\ f_{21} \\ 0 \\ 0 \\ 0 \\ 0 \\ f_{12} \\ f_{22} \end{array} \right),\end{displaymath}$ (25)

the two-dimensional convolution may be expressed as a one-dimensional convolution. To unwrap the filter into one dimension, the filter has been internally padded with zeros to fill out the first dimension of the data so the filtering does not overlap between columns. This padding will generally make the size of a two-dimensional filter much larger than a typical one-dimensional filter, although some space could be saved since the $\st A^{\dagger} \st A$ matrix will be block diagonal. For three-dimensional filters, the first two dimensions must be filled out with zeros, forming an extremely long one-dimensional filter. In spite of the advantages of Levinson recursion, since the size of the matrix $\st A^{\dagger} \st A$ depends on the square of the length of the one-dimensional filter formed by the padding, the $\st A^{\dagger} \st A$ matrix will soon become so large that it becomes impractical to solve for multi-dimensional problems.

In addition to the large size of the $\st A^{\dagger} \st A$ matrix, multi-dimensional filters tend to have many more coefficients than one-dimensional filters because of the higher number of dimensions involved. This increases the cost of solving for the filter by either the square or the cube of the number of coefficients, depending on the solution method. When internal convolutions with multi-dimensional filters are desired, $\st A^{\dagger} \st A$ is no longer Toeplitz, and half of the huge $\st A^{\dagger} \st A$ matrix must be stored and solved.

When inversions need to be done to predict full seismic datasets rather than just small filters, the number of elements to be calculated may be in the thousands or millions, making the number of elements in the matrix $\st A^{\dagger} \st A$ in the millions or billions. Solving, or even storing, such large matrices will generally be impractical.

Later I will discuss the conjugate-gradient technique, which allows an iterative solution to these inversion problems. The conjugate-gradient technique has the advantage that the matrix $\st A^{\dagger} \st A$ does not need to be stored, and a good approximation to the answer may be had by limiting the number of iterations, thus reducing the cost. Also, the programming of the problem tends to be simpler, since the conjugate-gradient technique requires just the coding of the convolution and its conjugate-transpose, or adjoint.

Next: Prediction-error or annihilation filters Up: Background and definitions Previous: Least-squares solutions to inverse

Stanford Exploration Project
2/9/2001