Prediction-error or annihilation filters

Next: An example of calculating Up: Background and definitions Previous: Solving methods

Prediction-error or annihilation filters

One of the most common geophysical applications of inversion is the calculation of prediction-error filters. A prediction-error $\sv r$ is defined as

$\begin{displaymath} r_j = \sum_{i=0}^{n} f_i d_{j-i},\end{displaymath}$

(26)

where $\sv f$ is the prediction-error filter of length $\sv n$ , and $\sv d$ is an input data series. The filtering operation may also be expressed as $\sv f \ast \sv d = \sv r$ ,where $\ast$ indicates convolution. For the sake of the discussion here, $\sv d$ is an infinitely long time series. As in the previous discussion, the error $\sv r$ and the data $\sv d$ will be assumed to be stationary and have a Gaussian distribution with a zero mean. The error $\sv r$ will also be assumed to be uncorrelated so that $E[\sv r \sv r^{\dagger}] = \sigma_r^2 \st I$ .

Application of a prediction-error filter removes the predictable information from a dataset, leaving the unpredictable information, that is, the prediction error. A typical use of prediction-error filters is seen in the deconvolution problem, where the predictable parts of a seismic trace, such as the source wavelet and multiples, are removed, leaving the unpredictable reflections.

The condition that $\sv r$ contains no predictable information may be expressed in several ways. One method is by minimizing $\sv r^{\dagger} \sv r$ , where $\sv r^{\dagger}$ is the conjugate transpose of $\sv r$ ,by calculating a filter that minimizes $(\sv f \ast \sv d)^{\dagger} (\sv f \ast \sv d)$ .This minimization reduces $\sv r$ to have the least energy possible, where the smallest $\sv r$ is assumed to contain only unpredictable information.

Another equivalent expression of unpredictability is that the non-zero lags of the normalized autocorrelation are zero, or that

$\begin{displaymath} { \sum\limits_{i=-\infty}^{\infty}{ r_{i} r_{i+k}} \over \sum\limits_{i=-\infty}^{\infty} { r_{i} r_{i} } } = \delta_k,\end{displaymath}$ (27)

where $\delta_k$ is one when k is zero and is zero otherwise. This approximation may also be expressed as $E[r_{i} r_{i+k}]=\sigma_r^2 \delta(k)$ ,where $\sigma_r^2$ is a scale factor that may be ignored. These two methods of expressing unpredictability are the basis for the sample calculations of the prediction-error presented in the next section.

The prediction-error filter may also be defined in the frequency domain using the condition that the expectation E[r_i r_j] = 0 for $i \neq j$ .Transforming the autocorrelation into the frequency domain gives $E[\overline{r(\omega)} r(\omega)] = 1$ ,where $\overline{r(\omega)}$ is the complex conjugate of $r(\omega)$ .Since $\sv r$ is the convolution of $\sv f$ and $\sv d$ , $r(\omega) = f(\omega) d(\omega)$ ,

$\begin{displaymath} E[\overline{r(\omega)} r(\omega)] = 1 = E[ \overline{f(\omega) d(\omega)} f(\omega) d(\omega) ].\end{displaymath}$ (28)

Since the filter $\sv f$ is a linear operator, it can be taken outside of the expectationPapoulis (1984) to make the previous expression become

$\begin{displaymath} \frac{1}{\overline{f(\omega)} f(\omega)} = E[\overline{ d(\omega)} d(\omega)].\end{displaymath}$ (29)

Thus, the power spectrum of $\sv f$ is the inverse of the power spectrum of $\sv d$ .Although the phase of the data $\sv d$ is lost when creating the cross-correlation of $\sv d$ to get $\overline{ d(\omega)} d(\omega)$ ,the phase of the filter is generally unimportant when the filter is being used as an annihilation filter in an inversion. For applications where a minimum phase filter is required, Kolmogoroff spectral factorizationClaerbout (1992a) may be used.

Another way of expressing the unpredictability of $\sv r$ is $E[\sv r \sv r^{\dagger}] = \st I$ ,that is, the expectation of $\sv r \sv r^{\dagger}$ is the identity matrix. This states that the expectation of the cross-terms of the errors are zero, that is, the errors are uncorrelated. This also states that the variances of the errors have equal weights. To make the matrices factor with an $\st L\st U$ decomposition Strang (1988), the expression $\sv f \ast \sv d = \sv r$ needs to be posed as a matrix operation $\st F\sv d=\sv r$ ,with $\st F$ as an upper triangular matrix. To do this, the indices of $\sv d$ and $\sv r$ are reversed from the usual order in their vector representations. A small example of $\st F\sv d=\sv r$ is

$\begin{displaymath} \left( \begin{array} {cccc} f_1 & f_2 & f_3 & f_4 \\ 0 & f_1... ...egin{array} {c} r_4 \\ r_3 \\ r_2 \\ r_1\end{array} \right)\end{displaymath}$ (30)

Building one small realization of $\sv r \sv r^{\dagger}$ gives

$\begin{displaymath} \left( \begin{array} {cccc} r_4 r_4 & r_4 r_3 & r_4 r_2 & r... ...1 \\ r_1 r_4 & r_1 r_3 & r_1 r_2 & r_1 r_1 \end{array}\right).\end{displaymath}$ (31)

To get an estimate of the expectation of this expression, it must be remembered that the data series is stationary, and the error will also be stationary. Only the differences in the indices are important, the locations are not. It can be seen that the elements of () are the elements of the autocorrelation at various lags. Using equation () makes the expectation of () become the identity matrix $\sigma_r \st I$ ,where the $\sigma_r$ may be dropped, since it is only a scale factor that may be incorporated into the filter or as a normalization of the autocorrelation.

Starting from the expression $E[\sv r \sv r^{\dagger}] = \st I$ and substituting $\st F \sv d$ for $\sv r$ gives

$\begin{displaymath} E[ (\st F\sv d) (\st F\sv d)^{\dagger} ] =\st I\end{displaymath}$ (32)

or

$\begin{displaymath} E[ \st F\sv d \sv d^{\dagger} \st F^{\dagger} ] =\st I.\end{displaymath}$ (33)

Once again, since $\st F$ and $\st F^{\dagger}$ are linear operators, they can be taken outside of the expectationPapoulis (1984)

$\begin{displaymath} \st F E[ \sv d \sv d^{\dagger}] \st F^{\dagger} =\st I.\end{displaymath}$ (34)

Moving the $\st F$ s to the right-hand side gives

$\begin{displaymath} E[ \sv d \sv d^{\dagger}] = \st F^{-1} (\st F^{\dagger})^{-1} = (\st F^{\dagger} \st F )^{-1}.\end{displaymath}$ (35)

Expanding one realization of a small example of $E[ \sv d \sv d^{\dagger}]$ gives

$\begin{displaymath} \left( \begin{array} {cccc} d_4 d_4 & d_4 d_3 & d_4 d_2 & d... ...1 \\ d_1 d_4 & d_1 d_3 & d_1 d_2 & d_1 d_1 \end{array}\right).\end{displaymath}$ (36)

Once again, to get an estimate of the expectation of this expression, it should be remembered that $\sv d$ is stationary, and only the differences in the indices are important. It can then be seen that E[d_i d_i-j] are elements of the autocorrelation, and $E[ \sv d \sv d^{\dagger}]$ is the autocorrelation matrix of $\sv d$ .Setting $\st A = E[ \sv d \sv d^{\dagger}]$ gives

$\begin{displaymath} \st F^{\dagger} \st F = \st A^{-1}.\end{displaymath}$ (37)

Generally, $E[ \sv d \sv d^{\dagger}]$ will be invertible and positive definite for real data. There are some special cases where $E[ \sv d \sv d^{\dagger}]$ is not invertible and positive definite, for example, when $\sv d$ contains a single sine wave. To avoid these problems, the stabilizer $\epsilon \st I$ is often added to the autocorrelation matrix, where $\epsilon$ is a small number and $\st I$ is the identity matrix. In the geophysical industry, this is referred to as adding white noise, or whitening, since adding $\epsilon \st I$ to the autocorrelation matrix is equivalent to adding noise to the data $\sv d$ .

The matrix $\st F$ may be obtained from the matrix $\st F^{\dagger} \st F$ by Cholesky factorizationStrang (1988), since $\st F^{\dagger} \st F$ is symmetric positive definite. Cholesky factorization factors a matrix into $\st L \st L^{\dagger}$ ,where $\st L$ looks like

$\begin{displaymath} \left( \begin{array} {cccc} l_{1,1} & 0 & 0 & 0 \\ l_{2,1}... ...0 \\ l_{4,1} & l_{3,2} & l_{2,3} & l_{1,4} \end{array}\right).\end{displaymath}$ (38)

The matrix $\st F$ obtained from this factorization will be upper triangular, as seen in (), so the filter is seen to predict a given sample of $\sv d$ from the past samples, and the maximum filter length is the length of the window. This matrix could be considered as four filters of increasing length. The longest filter, assuming it is the most effective filter, could be taken as the prediction-error filter.

Another way of looking at this definition of a prediction-error filter is to consider the filter as a set of weights $\st W$ producing a weighted least-squares solution. Following Strang 1986, the weighted error $\hat{\sv e}$ is $\st W \sv e$ , where $\st W$ is to be determined, and $\sv e$ is the error of the original system. The best $\hat{\sv e}$ will make

$\begin{displaymath} E[ \hat{\sv e} \hat{\sv e}^{\dagger} ] = \st I.\end{displaymath}$ (39)

Since $\hat{\sv e}=\st W \sv e$ ,

$\begin{displaymath} E[ \st W \sv e (\st W \sv e)^{\dagger} ] = \st W E[\sv e \sv e^{\dagger} ] \st W^{\dagger},\end{displaymath}$ (40)

where $E[\sv e \sv e^{\dagger} ]$ is the covariance matrix of $\sv e$ .Strang, quoting Gauss, says the best $\st W^{\dagger} \st W$ is the inverse of the covariance matrix of $\sv e$ .Setting $\st F=\st W$ and $\sv d=\sv e$ makes the weight $\st W$ become the prediction-error filter seen in equation ().

While I've neglected a number of issues, such as the invertability of $\sv d \sv d^{\dagger}$ ,the finite length of $\sv d$ , and the quality of the estimations of the expectations, in practice the explicit solution to () will not be used to calculate a prediction-error filter. Most practical prediction-error filters will be calculated using other, more efficient, methods. A traditional method for calculating a short prediction-error filter using Levinson recursion will be shown in the next section. In this thesis, most of the filters are calculated using a conjugate-gradient technique such as that shown in Claerbout 1992a.

Most prediction-error filters are used as simple filters, and the desired output is just the error $\sv r$ from the application of the filter $\st F\sv d=\sv r$ . Another use for these prediction-error filters is to describe some class of information in an inversion, such as signal or noise. In this case, these filters are better described as annihilation filters, since the inversion depends on the condition that the filter applied to some data annihilates, or zeros, a particular class of information to a good approximation. For example, a signal $\sv s$ may be characterized by a signal annihilation filter expressed as a matrix $\st S$ , so that $\st S \sv s \approx \sv 0$ which may be expressed as $\st S\sv s = \sv r$ , where $\sv r$ is small compared to $\sv s$ .A noise $\sv n$ may be characterized by a noise annihilation filter $\st N$ , so that $\st N \sv n \approx \sv 0$ .Examples of annihilation filters used to characterize signal and noise will be shown in chapters , , , and .

In this thesis, prediction-error filters will be referred to as annihilation filters when used in an inversion context. While prediction-error filters and annihilation filters are used in different manners, they are calculated in the same way. In spite of the similarities in calculating the filters involved, the use of these filters in simple filtering and in inversion is quite different. Simple filtering, whether one-, two-, or three-dimensional, involves samples that are relatively close to the output point and makes some simplifying assumptions. An important assumption is that the prediction error is not affected by the application of the filter. Inversion requires that the filters describe the data, and the characterization of the data is less local than it is with simple filtering. The assumption that the prediction error is not affected by the filter can be relaxed in inversion, a topic to be further considered in chapters and .

Next: An example of calculating Up: Background and definitions Previous: Solving methods

Stanford Exploration Project
2/9/2001