%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ %+ + %+ NOTICE: This article is also available formatted with illustrations. + %+ + %+ SEE: http://sepwww.stanford.edu/oldreports/sep61/ + %+ + %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \chapter{Time series analysis} \setcounter{chapter}{\TSA} %%%%\setcounter{page}{391} %%%%\footer{SEP-61} In chapter \CONJ\ we learned of many operators and how conjugates are a first approximation to inverses. In chapter \LS\ we learned how to make inverse operators from conjugate operators by the least squares (LS) method using conjugate gradients (CG). The many applications of least squares to the convolution operator constitutes the subject known as {\it time series analysis}. Here we examine applications of time series analysis to reflection seismograms. These applications further illuminate the theory of least squares in the area of weighting functions and stabilization. \par In the simplest applications, solutions may be most easily found in the frequency domain. When complications arise the time domain is better, using directly the convolution operator and the method of least squares. \par A first complicating factor for the frequency domain is a required boundary in the time domain, such as that between past and future, or requirements that a filter be nonzero in a stated time interval. Another factor that attracts us to the time domain from the Fourier domain is weighting functions. As we saw in the beginning of chapter \LS\ weighting functions are appropriate whenever a signal or image amplitude varies from place to place. Most of the literature on time-series analysis applies to the limited case of uniform weighting functions. Such time series are said to be {\it stationary}. This means that their statistical properties do not change in time. In real life, particularly in the analysis of echos, signals are never stationary in time and space. A stationary assumption is a reasonable starting assumption, but you should know how to go beyond it so you can take advantage of the many opportunities that do arise. In order of increasing difficulty in the frequency domain are the following complications: \begin{itemize} \item A time boundary such as between past and future \item More time boundaries such as delimiting a filter. \item Non-stationary signal, i.e.~time variable weighting. \item Time axis stretching such as normal moveout. \end{itemize} \par We won't have difficulty with any of these complications here because we'll stay in the time domain and set up and solve optimization problems using the conjugate-gradient (CG) method. Thus we can cope with great complexity in problem formulation and get the right answer without approximations. By contrast, analytic, or partly analytic methods can be more economical, though they generally solve somewhat different problems than those given to us by nature. \section{SHAPING FILTER} A shaping filter is a simple least-squares filter to convert one waveform to another. Shaping filters arise in a variety of contexts. In the simplest context, predicting one infinitely-long time series from another, the shaping filter can be found in the Fourier domain. \subsection{Source waveform and multiple reflections} Figure~\ref{tsa-wz24} shows some reflection seismic data recorded at nearly vertical incidence from an arctic ice sheet. \plot{tsa-wz24}{2.3in}{\FIGS/tsa/wz24}{ (tsa-wz24) Some of the inner offset seismograms from Arctic data set~24 (Yilmaz and Cumro). } Apparently the initial waveform is somewhat complex, but the water-bottom reflection does not complicate it further. You can confirm this by noticing the water-bottom multiple reflection, i.e.~the wave that bounces first from the water bottom, then from the water surface, and then a second time from the water bottom. This multiple reflection is similar, but with opposite polarity, to the shape of the primary water-bottom reflection. (The opposite polarity results from the reflection at the ocean surface where the acoustic pressure, the sum of the downgoing wave plus the upgoing wave, vanishes.) \par Other data in water of similar depth shows a different reflection behavior. The bottom gives back not a single reflection, but a train of reflections. Let this train of reflections from the ocean {\it Floor} be denoted by $F(Z)$. Instead of looking like $-F(Z)$ the first multiple reflection can look like $-F(Z)^2$. Figure~\ref{tsa-wz30} shows an example. \plot{tsa-wz30}{2.77in}{\FIGS/tsa/wz30}{ (tsa-wz30) Two displays of a common-shot collection of seismograms from offshore Crete, (Yilmaz and Cumro data set 30). Top display is called ``raster'' and bottom display ``wiggle.'' Raw data scaled by $t^2$. } Notice you can hardly see the soft mud arrival on the second multiple. More generally, on multiple reflections the ratio of mudstone strength to soft-mud strength increases with multiple order. Mathematically, multiple order is like $n$ and the wave train like $(1+Z)^n$. Notice in $(1+Z)^n$ that the ratio of the coefficient of $Z^1$ to the coefficient of $Z^0$ is $n$. Physically, in a multiple reflection, there is only one water-bottom path, but more paths to the mudstone. \todo{fig needed showing two pegleg paths.} \par Figures \ref{tsa-wz30} and \ref{tsa-wz24} illustrate a typical contrast of arctic data with data from temperate or tropic regions. The arctic water-bottom reflection is generally hard, indicating that the bottom is in a constant state of erosion from the scraping of the ice flows and the carrying away of sediments by the bottom currents, unlike temperate and tropical climates where the bottom is often covered with soft sediments, the top layer being unconsolidated mud, and deeper layers being mud consolidated into mudstone. \par Now we devise a simple mathematical model for the multiple reflections in Figure~\ref{tsa-wz24} and~\ref{tsa-wz30}. There are two unknown waveforms, the source waveform $S(Z)$, and the ocean-floor reflection $F(Z)$. The water-bottom primary reflection $P(Z)$ is the convolution of the source waveform with the water-bottom response so $P(Z)=S(Z)F(Z)$. The first multiple reflection $M(Z)$ sees the same source waveform, the ocean floor, a minus one for the free surface, and the ocean floor again. So the observations $P(Z)$ and $M(Z)$ as functions of the physical parameters are \begin{eqnarray} P(Z)&=&S(Z)\,F(Z) \label{tsa.PP} \\ M(Z)&=&-S(Z)\,F(Z)^2 \label{tsa.MM} \end{eqnarray} In Figure~\ref{tsa-wz24} it appears $F(Z)$ is nearly an impulse whereas Figure~\ref{tsa-wz30} is dominated by the nonimpulsive aspect of $F(Z)$. Algebraically the solutions of equations (\ref{tsa.PP}) and (\ref{tsa.MM}) are \begin{eqnarray} F(Z)&=&M(Z)/P(Z) \label{tsa.FF} \\ S(Z)&=&P(Z)^2/M(Z) \label{tsa.SS} \end{eqnarray} \par These solutions can be computed in the Fourier domain as described earlier with Figure~\ref{ls-harlan2}. Alternately the method of least squares can be used. The Fourier method seems slightly clearer. But a disadvantage of the Fourier method is that the answers for $S(Z)$ and $F(Z)$ could turn out to be unrealistically long in the time domain. The Fourier method would probably need to be modified by spectral smoothing to shorten the response functions. (The relation of spectral roughness to signal duration is illustrated in chapter \RAND.) The onset and duration of the filters $S(Z)$ and $F(Z)$ are much more readily controlled by a time domain method. The time-domain solutions are called ``shaping filters.'' \par To use least squares we begin by rewriting (\ref{tsa.FF}) and (\ref{tsa.SS}) as \begin{eqnarray} P(Z)F(Z)&\approx &M(Z) \label{tsa.floor} \\ M(Z)S(Z)&\approx &P(Z)^2 \label{tsa.source} \end{eqnarray} The unknowns $F(Z)$ and $S(Z)$ are readily constrained to be in any time window by expressing either (\ref{tsa.floor}) or (\ref{tsa.source}) as a transient convolution like equation~(\ref{conj.contran}). We can easily use the method of conjugate gradients to solve each system by employing the program {\tt contran()} on page~\pageref{contran}. The conjugate gradient program {\tt cgmeth()} on page~\pageref{cgmeth} needs only to have the matrix multiply replaced by {\tt contran()} as we did with the missing data program {\tt miss1()} on page~\pageref{miss1}. Thus \progblock{shaper}{\PROG/shaper.rt} \par The goal of finding the filters $F(Z)$ and $S(Z)$ is to best model the multiple reflections so they can be subtracted from the data, enabling us to see what primary reflections have been hidden by the multiples. Typical data includes not only that shown, but also wider source-receiver separation as well as many other nearby shots and their receivers. Corrections need to be done for hyperbolic travel time resulting from lateral separation between shot and receiver. Diffractions are a result of lateral imperfections in the generally flat sea-floor. These spatial aspects of this topic are considered at great length in my earlier book ``\IEI'' and we will investigate them here in a limited way. \subsection{Shaping a ghost to a spike} An exasperating problem in seismology is the ``ghost'' problem. In this problem, a waveform is replicated a short moment later because of a strong nearby reflection. In marine seismology the nearby reflector is the sea surface. It is near to both the sound gun and the hydrophones thus giving two ghosts. Upgoing waves and downgoing waves at the sea surface have opposite polarity because their pressures combine to zero at the surface. So waves seen in the hydrophone encounter the ghost operator $g_t=(1,0,0, \cdots ,-1)$ twice, once for the surface near the source and once for that near the hydrophone. The number of zeros is typically small depending on the depth of the device. The sound receivers can be kept away from surface water wave noise by positioning them deeper, but that extends the ghost delay, and as we'll see, this ghost is really hard to eliminate by processing. Suppose the ghost is $G(Z)=1-Z^2$. Theoretically the inverse is of infinite duration, namely $(1,0,1,0,1,0,1,0,1, \cdots)$. \par Since an infinitely long operator is not satisfactory, I used the program {\tt shaper()} above to solve a least-squares problem for an antighost operator of finite duration. Since you know the least-{\it squares} method abhors large errors and tends to equalize the errors, you should be able to guess the result. The filter $(.9,0,.8,0,.7,0,.6,0,.5,0,.4,0,.3,0,.2,0,.1)$ when convolved with $(1,0,-1)$ produces equal squared errors of $.01$ at each output time. \par The poor result can be described in the Fourier domain by the many zeros in the spectrum of $(1,0,-1)$. Since you can't divide by zero, you shouldn't try to divide by $1-Z^n$ which has zeros uniformly distributed on the unit circle. The method of least squares prevents disaster, but it cannot perform magic. \par I don't know whether you consider this result satisfactory, but I consider it a problem in search of a different solution. This problem also arises with seismograms recorded in a shallow borehole. As mentioned, the total problem generally includes many waveforms propagating in more directions so the whole problem is not so one dimensional as it may appear from Figures~\ref{tsa-wz30} and \ref{tsa-wz24} in which I did not display the wide-offset signals. \exer \item The velocity of sound in water is about 1485 meters/sec. How many zeros are in the ghost filter for geophones at 10 meters depth with data sampled at 2 milliseconds? \item Define the inputs to subroutine {\tt shaper()} that gave the filter $(.9,0,.8, \cdots .1)$ mentioned above. \exerend \section{SYNTHETIC DATA FROM FILTERED NOISE} A basic model for describing the random character of signals is to model them by putting random numbers into a filter. Practical work often consists of the reverse, deducing the filter and deconvolving it to see the input. \subsection{Gaussian signals versus sparse signals} Most theoretical work is based on random numbers from a Gaussian probability function. The basic theoretical model is that at {\it every} time point a Gaussian random number is produced. In real life we observe such signals, but we also observe signals with a sparser collection of bursts. These bursty signals can be described in either of two ways, (1) either many points have zero excitation (or an excitation that is smaller than expected from a Gaussian) or (2) the Gaussian probability function describes the many smaller values, but some larger values are noticeably present. \par It turns out the Gaussian probability function makes the most cryptic signals. It also turns out that theory is best developed for the Gaussian case. Thus Gaussian theory, which is the most pessimistic, tends to be applied to both Gaussian and sparser data. Sparse signals derive from a diverse collection of models, but usually there isn't enough information to establish a convincing model. In practical work, ``nongaussian'' generally means ``sparser than Gaussian.'' \plot{tsa-spikes}{2.26in}{\FIGS/tsa/spikes}{ (tsa-spikes) Left is random numbers from a Gaussian probability function. (The random numbers are connected by lines.) Right, the random numbers are cubed making a signal in which large spikes are sparser. } Figure~\ref{tsa-spikes} illustrates random signals from a Gaussian probability function and a sparser signal made by cubing the random numbers that emerge from a Gaussian random number generator. \subsection{Random numbers into a filter} Figure~\ref{tsa-leaky} shows random numbers into a leaky integrator and the resulting spectral amplitude. \plot{tsa-leaky}{2.2in}{\FIGS/tsa/leaky}{ (tsa-leaky) Left is sparse random noise passed through a leaky integrator. Right is the amplitude spectrum of the output. } The output spectral amplitude of an integrator would be $|\omega|^{-1}$, but the decay constant in the leaky integrator gives instead the amplitude $(\omega^2+\epsilon^2)^{-1/2}$. Since the random numbers are sparse, you can see damped exponents in the data itself. This enables you to confirm the direction of the time axis. If the random numbers had been Gaussian, the expected spectrum would be the same, but you would not be able to see the damped exponents nor detect the direction of time. \subsection{Random numbers into the seismic spectral band} Figure~\ref{tsa-band} shows synthetic data designed to look like real seismic noise. \plot{tsa-band}{2.2in}{\FIGS/tsa/band}{ (tsa-band) Left is Gaussian random noise passed through Butterworth filters to simulate the seismic pass band. Right is the amplitude spectrum of the output. } Here some Gaussian random numbers are passed into a filter to simulate the seismic passband. Two five-term Butterworth filters were used, a highcut at .4 of the Nyquist and a lowcut at .1 of the Nyquist. \section{BLIND DECONVOLUTION} Blind deconvolution presumes the ``white noise into a filter'' model and proceeds to estimate the filter and deconvolve it. Equivalently, an inverse filter is estimated and convolved. The basic idea is simple but complications arise in the improvements. Chapter \RAND\ shows that the spectrum of random noise is also noise, but positive. Smoothing it approximates the mathematical expectation operator, and makes the spectrum tend to a constant. Thus the spectrum of the output of random noise into a filter is like the spectrum of the filter, but it is jagged because of the noise. This suggests smoothing that spectrum and using it as the spectrum of the unknown filter. \subsection{Blind deconvolution by Fourier methods} \par An estimate of the blind deconvolution filter generally begins with the smoothed amplitude spectrum of the observations, or equivalently with a tapered autocorrelation function. The simplest guess is to inverse transform the spectrum to a time function. The time-domain wavelet (found by inverse transforming the amplitude spectrum) will be a symmetrical signal but in real life the wavelet should be causal. A causal filter can be created either by straightforward numerical methods of this chapter, but it can also be made by more mathematical methods in other chapters. Chapter \SPEC\ shows a Fourier method, called the Kolmogoroff method, for finding a causal wavelet of a given spectrum. Chapter \RAND\ shows that the length of the Kolmogoroff wavelet depends on the amount of spectral smoothing which in this chapter is like the ratio of the data length to the filter length. \par In the blind deconvolution problem, Fourier methods determine the spectrum of the unknown wavelet, but seem to be unable to determine its phase by measurements, only to assert it by theory. This is a limitation of the ``stationarity'' assumption, that signal strengths are uniform in time. Where signal strengths are nonuniform, better results can be found with weighting functions and time domain methods. In Figure~\ref{tsa-dallpass} we'll see that the all-pass filter again becomes visible when we take the trouble to apply appropriate weights. \subsection{The error filter family} There are a wide variety of least-squares filters. The most important one is the prediction-error (PE) filter. A PE filter has the form $(1, a_1,a_2,a_3, \cdots ,a_n)$ where the coefficients $a_j$ are designed by a least squares procedure to have minimum output power. The filter $(-a_1,-a_2,-a_3, \cdots , -a_n)$ is a prediction filter for the point off its end. The prediction-error filter is the time-domain counterpart to the Kolmogoroff wavelet of chapter \SPEC. \par A variant form of the PE is the gapped PE filter, which for a gap of two, takes the form $(1, 0, 0, a_3,a_4,\cdots ,a_n)$. Another filter that we'll encounter is the interpolation-error filter. Its coefficients are $(a_{-m}, \cdots, a_{-2}, a_{-1}, 1, a_1, a_2, a_3, \cdots ,a_n)$. It can be gapped too. The program will be discussed after a collection of examples. \subsubsection{Prediction-error (PE) filters on synthetic data} The idea of gapping a prediction is to moderate the goal of converting realistic signals into perfect impulses. Figure~\ref{tsa-dleak} shows synthetic data, \plot{tsa-dleak}{1.8in}{\FIGS/tsa/dleak}{ (tsa-dleak) Deconvolution of leaky integrator signals with PE filters of various prediction-gap sizes. Inputs and outputs on alternate traces. Gap size increases from left to right. } sparse noise into a leaky integrator, and deconvolutions with prediction-error filters. Theoretically the filters should turn out to be $1-.9Z^{\rm gap}$. The result is quite successful with varying degrees of success being seen by the filters obtained on the varying traces. \par To see what happens when an unrealistic deconvolution goal is set for prediction error, we could try to compress a wavelet that is resistant to compression, for example the impulse response of a Butterworth bandpass filter. The perfect filter to compress any wavelet is its inverse. But the Butterworth filter has a wide region of its spectrum where it is nearly zero so any presumed inverse to it must be nearly dividing by that range of zeros. Compressing a Butterworth filter is so tough that I omit the random numbers used in Figure~\ref{tsa-dleak} and apply prediction error to the Butterworth response itself, in Figure~\ref{tsa-dbutter}. \sideplot{tsa-dbutter}{1.92in}{\FIGS/tsa/dbutter}{ (tsa-dbutter) Butterworth deconvolution by prediction error. } So we have seen that gapped PE filters sometimes are able to compress a wavelet, and sometimes not. In real life resonances arise in the earth's shallow layers and as we'll see, the resonant filters can be shortened by PE filters. \subsubsection{PEF on field data} \par Figure~\ref{tsa-wz14} is a nice example illustrating the utility of prediction-error filters. \plot{tsa-wz14}{2.7in}{\FIGS/tsa/wz14}{ (tsa-wz14) Data from offshore Texas. (extracted from Yilmaz and Cumro dataset 14). Wiggle display above and raster below. Inputs above outputs. Filters displayed on the right. } The input is quasi-sinusoidal which indicates that PE filtering should be successful. Indeed some events are uncovered that would not likely have been identified on the input. In this figure, a separate problem is solved for each trace and the resulting filter is shown on the right. \subsection{The prediction-error filter output is white} \par The most important property of a PE filter is that its output tends to a white spectrum. No matter what the input of this filter, the output tends to whiteness as the number of the coefficients $n \rightarrow \infty$ tends to infinity. Thus, the prediction-error filter adapts itself to the input by absorbing all its color. If the input is already white, the $a_j$ coefficients vanish---the PE filter is frustrated because with a white input it can predict nothing so the output is the same as the input. Thus, if you were to cascade one PE filter after another, you would find only the first filter does anything. If the input is a sinusoid, it is exactly predictable by a three-term recurrence relation, so then all the color is absorbed by a three-term PE filter (see exercises). The power of a PE filter is that a short filter can often extinguish, and thereby represent the information, in a long filter. \par It is a useful practical feature that the output of a PE filter is white. Imagine the reverberation of the soil layer, highly variable from place to place, as the resonance between the surface and deeper consolidated rocks varies rapidly with surface location as a result of geologically recent fluvial activity. The spectral color of this erratic variation on surface-recorded seismograms is compensated by a PE filter. Of course, it is not desired that PE filtered seismograms be white, but once they all have same spectrum, it is easy to postfilter them to any desired spectrum. \par Since the PE filter has an output spectrum that is white, it means that the PE filter itself has a spectrum that is inverse to the input. Indeed, an effective mechanism of spectral estimation developed by John P. Burg and described in chapter \TOE, is to compute a PE filter and look at the inverse of its spectrum. \par Another interesting property of the PE filter is that it is minimum phase. The best proofs of this property are found in my earlier book, ``\FGDP.'' The proofs assume uniform weighting functions. \subsection{Non whiteness of gapped PEF output} When a PE filter is constrained to have a few near-zero-lag coefficients be zero, the output no longer tends to be white as the number of coefficients in the filter tends to infinity. If $f_1$, the filter coefficient of $Z = e^{i \omega \Delta t}$, vanishes then $F(\omega)$ lacks the slow variation in $\omega$ that this term provides. It lacks just the kind of spectral variation that could boost weak near-Nyquist noises up to the strength of the main pass band. With such variation absent by the constraint, the growth of Nyquist region energy is no longer a necessary byproduct of PE filtering. \par Figure~\ref{tsa-wz27} illustrates a PE filter with a long gap. \plot{tsa-wz27}{2.63in}{\FIGS/tsa/wz27}{ (tsa-wz27) Data from offshore Canada (extracted from Yilmaz and Cumro dataset 27) processed by gapped prediction error. Inputs above outputs. Filters displayed on the right. Nicely suppressed multiples in boxes. Badly suppressed multiples above diagonal lines. } (The gap was chosen to be a little less than the water depth). This example nicely shows the suppression of some multiple reflections, but unfortunately I don't see any primary reflections that have been uncovered. Because the prediction gap is so long, the filter causes no visible change to the overall spectrum. Notice how much more the spectrum was broadened by the shorter-gapped filter in Figure~\ref{tsa-wz14}. The theoretical association of prediction gap width with spectral broadening is examined next. Another interesting feature of Figure~\ref{tsa-wz27} that we will investigate later is a geometrical effect. This shows up as poor multiple removal on and above the diagonal lines. This happens because of the non-zero separation of the sound source and receiver. \subsection{Proof that the output of a PE filter is white} \label{'white_proof'} The proof is lengthy but it is worthwhile because it reviews many basic concepts in chapter \CONV\ and \ZZ. You can omit the proof below which probably belongs in a theory chapter. \par The basic idea of least-squares fitting is that the residual is orthogonal to the fitting functions. Applied to the PE filter this idea means that the output of the PE filter is orthogonal to lagged inputs. The orthogonality applies only for lags in the past because prediction knows only the past while it aims to the future. What we want to show is slightly different, namely that the output is uncorrelated with {\it itself} (not with the input) for lags in {\it both} directions. \par By way of background, first consider two filters, minimum-phase and mutually inverse $A(Z)B(Z)=1$ so $A=1/B$ and $\overline{A} = 1/\overline{B}$. Their spectra are also inverses so $\overline{A}\overline{B} B A = 1$. We will later recognize $BA$ as a white PE output. Dividing $\overline{A}\overline{B}BA$ through by $\overline{A}$ gives \begin{equation} [\overline{B}(1/Z)] [B(Z) A(Z)] \eq { 1 \over \overline{A}(1/Z)} \eq \overline{B}(1/Z) \eq b_0 +b_1Z^{-1} +b_2Z^{-2} + \cdots \label{tsa.oneside} \end{equation} whose coefficients vanish for positive powers of $Z$ but not for negative powers. We'll come to associate $[\overline{B}(1/Z)]$ with lagged input and $[B(Z)A(Z)]$ with PEF output. The right side of~(\ref{tsa.oneside}) shows they are uncorrelated in one sense of time, but not the other. \par For the proof we take a signal $X(Z)$ that by hypothesis is the output of white noise into an unknown filter $B(Z)$. The calculation begins from the autocorrelation of $x_t$ which is expected to be the autocorrelation of $b_t$, i.e.~$\E [\overline{X}X] = \overline{B} B$. (More about statistical expectation ``\E'' is found in chapter \RAND). Since any spectrum has a minimum-phase factorization, the given information is unchanged if $B(Z)$ is minimum phase. We define the PEF by minimizing the power in $X(Z)A(Z)$. This power is the coefficient of $Z^0$ of the expression the spectrum $$ {\rm power} \eq {\rm coef~of~} Z^0 {\rm ~in~} \overline{A}(1/Z) \overline{B}(1/Z) B(Z) A(Z) $$ Setting to zero the derivative with each coefficient in $\overline{A}$ (expanded details in chapter \TOE) gives the zero coefficients on the positive powers of $Z$ in equation~(\ref{tsa.oneside}). To show that the output $X(Z)A(Z)$ is uncorrelated with itself, we take equation~(\ref{tsa.oneside}) and replace $\overline{B} B$ by $\E [\overline{X} X]$ and then multiply by $A(1/Z)$ and we get the white-output statement that $[A(1/Z)X(1/Z)][X(Z)A(Z)]=1$. \subsection{Postcoloring versus Prewhitening} The output of a PE filter is white (unless it is gapped). But people don't like to look at white signals. Signals are normally sampled adequately densely, which means that they are small anywhere near the Nyquist frequency. There is rarely energy above the half Nyquist and generally little but marine noises above quarter Nyquist. To avoid boosting these noises, the ungapped PE filter is generally altered or accompanied by other filters. Three common approaches are: \begin{itemize} \item Use a gapped filter. \item Deconvolve, then apply a filter with the desired spectrum $S(\omega)$. \item Prefilter the input with $S(\omega)^{-1}$, then deconvolve with an ungapped PEF, and finally postfilter with $S(\omega)$. \end{itemize} The last process is called ``prewhitening'' for some complicated reasons that motivate the name. The idea seems to be that the prefilter removes known color so that the least squares coefficients are not ``wasted'' predicting what is already known. So theoretically the prefilter spectrum $S(\omega)^{-1}$ is the inverse of the prior estimate of the input spectrum. In real life, that is merely an average of estimates from other data. If the desired output spectrum does not happen to be $S(\omega)$, it doesn't matter since any final display filter can be used. Although this is a nice idea, I have no example to illustrate it. \par There is also the question of what phase the postfilter should have. Here are some cautions against the obvious two choices: \begin{itemize} \item symmetrical. A symmetrical filter has a noncausal response. \item causal. If a later step of processing is to make a coherency analysis for velocity versus time, then the effective time will be more like the signal maximum than the first break. \end{itemize} Since the postfilter is broad band, its phase is not so important as that of the deconvolution operator which tries to undo the phase of a causal and resonant earth. \section{WEIGHTED ERROR FILTER CALCULATION} Here we'll look at the motivation for introducing weighting functions into filter analysis and see how to write the CG program that produced most of the examples in this chapter. \subsection{AGC (automatic gain control)} \label{'AGC'} Echos get weaker with time, though the information content is about independent of the signal strength. Echos also vary in strength as different materials are encountered by the outgoing wave. Programs for echo analysis typically divide the data by a scaling factor that is a smoothed average of the signal strength. This is a practice that is near universal, though it is fraught with hazards. An example of an AGC would be to compute the divisor by forming the absolute value of the signal strength and then smoothing with the program {\tt triangle()} on page~\pageref{triangle} or the program {\tt leaky2()} on page~\pageref{leaky2}. Pitfalls are the strange amplitude behavior surrounding the water bottom, and the overall loss of information contained in amplitudes. Personally I have found that the gain function $t^2$ nearly always eliminates the need for AGC, but I have no doubt that AGC is occasionally needed. (A theoretical explanation for $t^2$ is found in my other book ``\IEI.'') \subsection{Gain before or after convolution} It is common practice to apply AGC to echo soundings before filter analysis. This is a questionable practice. The proper practice is to analyze first according to the laws of physics and only at the last stage to apply gain functions for purposes of statistical estimation and final display. Here we'll examine correct and approximate ways of setting up deconvolution problems with gain functions. Then we'll use CG to solve the proper formulation. \par Solving problems in the time domain offers an advantage over the frequency domain because in the time domain it is easy to control the interval where the solution should exist. Another advantage of the time domain arises when weighting functions are appropriate. I've noticed that people sometimes use Fourier solutions thereby forcing themselves to use uniform weighting when another weighting would be more appropriate. Since we look at echos it is unavoidable that we apply gain functions. Weighting is always justified on the process {\it outputs}, but it is an approximation of unknown validity on the data that is {\it input} to those processes. I'll clarify this approximation by an equation with two filter points and an output of three time points. In real life application the output is typically 1-2 thousand points and the filter 5-50 points. The valid formulation of a filtering problem is \begin{equation} \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right] \quad \approx \quad \left[ \begin{array}{ccc} w_1 & 0 & 0 \\ 0 & w_2 & 0 \\ 0 & 0 & w_3 \end{array} \right] \left( \left[ \begin{array}{c} d_1 \\ d_2 \\ d_3 \end{array} \right] \ - \ \left[ \begin{array}{cc} b_1 & 0 \\ b_2 & b_1 \\ 0 & b_2 \end{array} \right] \ \left[ \begin{array}{c} f_1 \\ f_2 \end{array} \right] \right) \label{tsa.weighted} \end{equation} The weights $w_t$ are any positive numbers you choose, typically the $w_t$ are chosen so the residual components are about equal in magnitude. \par If instead the weighting function is applied to the inputs we have an approximation that is something somewhat different: \begin{equation} \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right] \quad \approx \quad \left[ \begin{array}{c} w_1 d_1 \\ w_2 d_2 \\ w_3 d_3 \end{array} \right] \ - \ \left[ \begin{array}{cc} w_1 b_1 & 0 \\ w_2 b_2 & w_1 b_1 \\ 0 & w_2 b_2 \end{array} \right] \ \left[ \begin{array}{c} f_1 \\ f_2 \end{array} \right] \label{tsa.kludge} \end{equation} Comparing the weighted output-residual equation~(\ref{tsa.weighted}) to the weighted input data equation~(\ref{tsa.kludge}) we note the right column of the matrix in~(\ref{tsa.kludge}) does not match that in~(\ref{tsa.weighted}). If the right hand column in~(\ref{tsa.kludge}) had been defined as $(0, w_2 b_1, w_3 b_2)'$ instead of $(0, w_1 b_1, w_2 b_2)'$ then the equations would match, but then the matrix in~(\ref{tsa.kludge}) would not be a simple convolution and some fast solution methods would not apply. (Such methods are both Fourier and Toeplitz, to be described in chapter \TOE). The approximation~(\ref{tsa.kludge}) becomes reasonable when the weights are slowly variable, i.e. $w_t$ is a slowly variable function of $t$. In practice, I think the approximation is often justified for slow $t^2$ gain but questionable for automatic gains that are faster. The approximation (\ref{tsa.kludge}) enables use of Fourier methods, because the problem still retains an entirely convolutional form. Compared to Toeplitz methods of solving equation~(\ref{tsa.kludge}), the CG method of solving~(\ref{tsa.weighted}) is slower. The Toeplitz cost is proportional to the number of unknowns instead of its square. Here we are going to see how to solve the problem correctly. If you want to solve the correct problem rapidly, perhaps you can do so by solving the approximate problem first by a quasi-analytic method, and then do a few of steps of CG. \subsection{Setting up any weighted CG program} Equation~\ref{tsa.weighted} is of the form $\bold 0 \approx \bold W (\bold d - \bold B\bold f)$. This can be converted to a new problem without weights by defining a new data vector $\bold W\bold d$ and a new operator $\bold W\bold B$ simply by carrying $\bold W$ through the parentheses to $\bold 0 \approx \bold W \bold d - (\bold W \bold B)\bold f$. Thus, you can use the same program {\tt shaper()} that we used for unweighted least squares. Alternately we can keep the old data and old operator and put the weighting function in the CG program as below. You will notice that the convolution program {\tt contrunc(0,...)} has been replaced by itself followed by a diagonal weighting. Likewise, since the conjugate operator of $\bold W\bold B$ is $\bold B' \bold W'$ we see {\tt contrunc(1,...)} replaced by itself {\it preceded} by the diagonal weighting. \subsection{Calculating error filters} The {\it error} in prediction (or interpolation) is often more interesting than the prediction itself. When the predicted component is removed, leaving the unpredicatable, the residual is called the prediction error. Let us see how the program {\tt shaper()} can be used to find an interpolation error filter like $(f_{-2},f_{-1},1,f_1,f_2)$. The statement of wishes is: \begin{equation} \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} \right] \quad \approx \quad \left[ \begin{array}{c} . \\ . \\ x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \\ x_6 \\ . \\ . \end{array} \right] \ \ +\ \ \left[ \begin{array}{cccc} x_1 & . & . & . \\ x_2 & x_1 & . & . \\ x_3 & x_2 & . & . \\ x_4 & x_3 & x_1 & . \\ x_5 & x_4 & x_2 & x_1 \\ x_6 & x_5 & x_3 & x_2 \\ . & x_6 & x_4 & x_3 \\ . & . & x_5 & x_4 \\ . & . & x_6 & x_5 \\ . & . & . & x_6 \end{array} \right] \ \left[ \begin{array}{c} f_{-2} \\ f_{-1} \\ f_1 \\ f_2 \end{array} \right] \label{tsa.almost} \end{equation} Changing the polarity of all the matrix elements and bringing the matrix to the other side of the equation gives us the standard form of overdetermined simultaneous equations. After solving this system for $(f_{-2},f_{-1},f_1,f_2)$ we insert the ``1'' to make the IE filter $(f_{-2},f_{-1},1,f_1,f_2)$ which applied to the data $x_t$ gives the desired IE output. \par Notice that the matrix in (\ref{tsa.almost}) is {\it almost} convolution. It would be convolution except the central column is absent. I propose you not actually solve the system~(\ref{tsa.almost}), instead I'll show you a more general solution that uses the convolution operator itself. That way you won't need to write programs for the many ``almost'' convolution operators arising from the many PE and IE filters with various gaps and lags. \par The conjugate-gradient program is a combination of earlier CG programs and the weighting methods we must introduce here. \todo{Want to introduce weighting in CG long before IE filtering} \begin{itemize} \item We need to constrain a filter coefficient to be unity, which we can do by initializing it to be unity and then allowing no changes to it. \item We may wish to constrain some other filter coefficients to be zero (gapping) by initializing them to zero and allowing no changes to them. \item The polarity convention change for the unknowns can be put into the operator by replacing it with a product of itself with a sign-change operator. \item We may want the output to occur someplace other than off-end prediction. Thus we will specify a time lag that denotes the predicted or interpolated time point. The program {\tt contrunc()} is designed for this. \end{itemize} Incorporating all these features into {\tt shaper()} we get {\tt iner()}. \progblock{iner}{\PROG/iner.rt} For a filter of the form $(1, f_1,f_2, \cdots ,f_{n-1})$ you would specify {\tt lag=1, gap1=1, gapn=1}. For a filter of the form $(1, 0, f_2, \cdots ,f_{n-1})$ you would specify {\tt lag=1, gap1=1, gapn=2}. For a filter of the form $(f_{-2},f_{-1},1,f_1,f_2)$ you would specify {\tt nf=5, lag=3, gap1=3, gapn=3}. \par This program uses the convolution program {\tt contrunc()} which is handy in practice because its output has the same length as its input. This convenience is partly offset by a small danger that significant output energy in the ``start up'' and ``off end'' zones could be truncated. In equation~(\ref{tsa.almost}) the routine {\tt contrunc()} drops some top and/or bottom rows. \subsection{Stabilizing technique} Earlier some theory for stabilizing least squares was described with equations (\ref{uni.stabfilt}) and (\ref{ls.knownfilt}). I installed this stabilization along with the filter determinations in this chapter, but as I expected, stabilization in this highly overdetermined application showed no advantages. Never-the-less, it is worth seeing how this is done, particularly since the changes to the program make for more informative plots. The input data is modified by appending a zero padded impulse at the end. The output will contain the filter impulse response in that region. The spike size is chosen to be compatible with the data size, for convenience of the plotting programs. The weighting function in the appended region is scaled according to how much stabilization is desired. Figure~\ref{tsa-wz33} shows the complete input and residual. \plot{tsa-wz33}{1.38in}{\FIGS/tsa/wz33}{ (tsa-wz33) Data from the North Sea (extracted from Yilmaz and Cumro dataset 33) processed by prediction error. Rightmost box is weighted according to desired stabilization. The truncation event is weighted by zero. } It also shows the problem that output data flows beyond the length of the input data because of the nonzero length of the filter. This extra output is undoubtedly affected by the truncation of the data, and its energy should not be part of the energy minimization. So it is weighted by zero. \exer \item Given a sinusoidal function $x_t = \cos(\omega t + \phi)$, find a three-term recurrence relationship that predicts $x_t$ from the previous two points, namely $x_t = a_1 x_{t-1} + a_2 x_{t-2}$. Hence deduce that a sinusoid is perfectly predictable given two successive past values. The coefficients depend on $\omega$ but not $\phi$. \item Figure~\ref{tsa-wz14} has a separate filter for each trace. Consider the problem of finding a single filter for all the traces. What is the basic operator and its conjugate? Assemble these operators using subroutine {\tt contrunc()} on page \pageref{contrunc}. \item Examine the filters on Figure~\ref{tsa-wz33}. Notice besides the pulse at the water depth is another weak one at double that depth. Suggest a physical mechanism. Suggest a mechanism relating to computational approximations. \exerend \section{INTERPOLATION ERROR} An interpolation-error (IE) filter has the form $(a_{-m}, \cdots, a_{-2}, a_{-1}, 1, a_1, a_2, a_3, \cdots ,a_n)$ where the $a_t$ coefficients are adjusted to minimize the power in the filter output. This filter has the strange character that if the input spectrum is $S(\omega)$, then the output spectrum will tend to $S(\omega)^{-1}$. Thus the IE filter tends to turn poles into zeros and vice versa. To see why the IE filter inverts the spectrum of the input, we only need recall the basic premise of least squares methods, that the residual (the output) is orthogonal to the fitting function (the input at all lags except the zero lag). Thus the crosscorrelation of the input and the output is an impulse. This can only happen if their spectra are inverses. This is a disaster for the overall appearance of a seismogram. Such drastic spectral change can be controlled in a variety of ways as with PE filters, but here there seems to be little experience besides my own. Figure~\ref{tsa-wz33ie} illustrates an interpolation-error result where gapping has been used to limit the color changes. \plot{tsa-wz33ie}{2.55in}{\FIGS/tsa/wz33ie}{ (tsa-wz33ie) Data from the North Sea (extracted from Yilmaz and Cumro dataset 33) processed by interpolation error. Inputs above outputs. Filters displayed on the right. } I also chose the gap to try to condense the wavelet. Your judgement is needed to see if the result is successful. Notice also a high frequency arrival after the diagonal lines. This shows the IE filters are boosting very high frequencies despite the gapping. \par I am fascinated by the theoretical implications of IE filters. This is not the place to expound on my research in this area. But, recall that least-squares problems should be formulated with an inverse-covariance weighting to the residuals, and in signal and image analysis this means the expectation of the residuals should be white, i.e.~have an impulsive {\it autocorrelation.} What I have noticed is that the product of the output of an IE filter with its input has an impulsive {\it crosscorrelation} and I am trying to exploit this to make high quality estimates. (Look back to Figure~\ref{uni-filt} and imagine the bottom panels being the output of a 2-D IE filter, and imagine instead of using dot products on the images, using crossproducts between upper and lower images.) Another fact about IE filters is that they tend to be symmetrical functions. \subsection{Blind all-pass decon} A well established theoretical concept that leads to unwarranted pessimism is the idea that blind deconvolution cannot find an all-pass filter. If we carefully examine the analysis leading to that conclusion we will find lurking the assumption that the weighting function used in the least-squares estimation is uniform. And when this assumption is wrong, so is the conclusion, as Figure~\ref{tsa-dallpass} shows. \plot{tsa-dallpass}{1.9in}{\FIGS/tsa/dallpass}{ (tsa-dallpass) Four independent trials of deconvolution of sparse noise into an all-pass filter. Alternate lines are input and output. } Recall that the inverse to an all-pass filter is its time reverse. The reversed shape of the filter is seen for the inputs where there happen to be isolated spikes. \par Let us see what theory predicts can't be done, and then I'll tell you how I did it. If you examine the unweighted least-squares error filter programs you notice the first calculation is the the convolution operator and then its transpose. This takes the autocorrelation of the input and uses it as a gradient search direction. Take a white input and pass it through a phase-shift filter. The autocorrelation of the output should be an impulse function. Since the impulse itself is constrained against rescaling, the effective gradient is zero, and the solution, an impulse filter, is already at hand, and so a phase-shift filter seems unfindable. \par On the other hand, if the signal strength of the input varies, it means we should be balancing its expectation by weighting functions. This is what I did in Figure~\ref{tsa-dallpass}. I chose a weighting function equal the inverse of the absolute value of the output of the filter plus an $\epsilon$. Since the weighting function depends on the output, the process is iterative. The value of $\epsilon$ chosen was 20\% of the maximum signal value. \par Since the iteration is a nonlinear procedure, it might not always work. There is a well-established body of theory that says this won't work with Gaussian signals, and Figure~\ref{tsa-dgauss} is consistent with that theory. \plot{tsa-dgauss}{2.9in}{\FIGS/tsa/dgauss}{ (tsa-dgauss) Failure of blind all-pass deconvolution for Gaussian signals. The top signal is based on Gaussian random numbers. Lower signals are based on successive integer powers of Gaussian signals. Filters (on the right) fail for the Gaussian case, but improve as signals become sparser. } \par Back in Figure~\ref{tsa-wz33ie} I used weighting functions roughly inverse to the envelope of the signal with a floor for the envelope being taken at 20\% of the signal maximum. Since weighting functions were used, the filters need not have turned out symmetrical about their centers, but the resulting asymmetry seems to be small.