%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
%+                                                                           +
%+  NOTICE:  This article is also available formatted with illustrations.    +
%+                                                                           +
%+  SEE:     http://sepwww.stanford.edu/oldreports/sep61/                    +
%+                                                                           +
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


\chapter{Time series analysis}
\setcounter{chapter}{\TSA}

%%%%\setcounter{page}{391}
%%%%\footer{SEP-61}

In chapter \CONJ\ we learned of many operators
and how conjugates are a first approximation to inverses.
In chapter \LS\ we learned how to make inverse operators
from conjugate operators by the least squares (LS) method
using conjugate gradients (CG).
The many applications of least squares to the convolution operator
constitutes the subject known as {\it time series analysis}.
Here we examine applications of time series analysis
to reflection seismograms.
These applications further illuminate the theory of least squares
in the area of weighting functions and stabilization.

\par
In the simplest applications, solutions may be most easily found
in the frequency domain.
When complications arise the time domain is better,
using directly the convolution operator and the method of least squares.

\par
A first complicating factor for the frequency domain is a required boundary
in the time domain, such as that between past and future,
or requirements that a filter be nonzero in a stated time interval.
Another factor that attracts us to the time domain from the Fourier
domain is weighting functions.
As we saw in the beginning of chapter \LS\ weighting functions
are appropriate whenever a signal or image amplitude varies
from place to place.
Most of the literature on time-series analysis
applies to the limited case of uniform weighting functions.
Such time series are said to be {\it stationary}.
This means that their statistical properties do not change in time.
In real life, particularly in the analysis of echos,
signals are never stationary in time and space.
A stationary assumption is a reasonable starting assumption,
but you should know how to go beyond it
so you can take advantage of the many opportunities that do arise.
In order of increasing difficulty in the frequency domain are
the following complications:
\begin{itemize}
\item A time boundary such as between past and future
\item More time boundaries such as delimiting a filter.
\item Non-stationary signal, i.e.~time variable weighting.
\item Time axis stretching such as normal moveout.
\end{itemize}

\par
We won't have difficulty with any of these complications here
because we'll stay in the time domain
and set up and solve optimization problems using
the conjugate-gradient (CG) method.
Thus we can cope with great complexity in problem formulation
and get the right answer without approximations.
By contrast, analytic, or partly analytic methods
can be more economical, though they generally solve
somewhat different problems than those given to us by nature.


\section{SHAPING FILTER}
A shaping filter is a simple least-squares filter
to convert one waveform to another.
Shaping filters arise in a variety of contexts.
In the simplest context,
predicting one infinitely-long time series from another,
the shaping filter can be found in the Fourier domain.

\subsection{Source waveform and multiple reflections}
Figure~\ref{tsa-wz24} shows some reflection seismic data recorded
at nearly vertical incidence from an arctic ice sheet.
\plot{tsa-wz24}{2.3in}{\FIGS/tsa/wz24}{ (tsa-wz24) 
	Some of the inner offset seismograms from Arctic data set~24
	(Yilmaz and Cumro).
	}
Apparently the initial waveform is somewhat complex,
but the water-bottom reflection does not complicate it further.
You can confirm this by noticing the water-bottom
multiple reflection, i.e.~the wave that bounces first from the water bottom,
then from the water surface, and then a second time from the water bottom.
This multiple reflection is similar, but with opposite polarity,
to the shape of the primary water-bottom reflection.
(The opposite polarity results from the reflection at the ocean surface
where the acoustic pressure,
the sum of the downgoing wave plus the upgoing wave, vanishes.)

\par
Other data in water of similar depth
shows a different reflection behavior.
The bottom gives back not a single reflection, but a train of reflections.
Let this train of reflections from the ocean {\it Floor} be denoted by $F(Z)$.
Instead of looking like $-F(Z)$ the first multiple reflection
can look like $-F(Z)^2$.
Figure~\ref{tsa-wz30} shows an example.
\plot{tsa-wz30}{2.77in}{\FIGS/tsa/wz30}{ (tsa-wz30) 
	Two displays of a common-shot collection of seismograms
	from offshore Crete,
	(Yilmaz and Cumro data set 30).
	Top display is called ``raster'' and bottom display ``wiggle.''
	Raw data scaled by $t^2$.
	}
Notice you can hardly see the soft mud arrival on the second multiple.
More generally,
on multiple reflections the ratio of
mudstone
strength
to
soft-mud
strength increases with multiple order.
Mathematically, multiple order is like $n$ and the wave train like $(1+Z)^n$.
Notice in $(1+Z)^n$ that the ratio
of the coefficient of $Z^1$ to the coefficient of $Z^0$ is $n$.
Physically, in a multiple reflection,
there is only one water-bottom path,
but more paths to the mudstone.
\todo{fig needed showing two pegleg paths.}

\par
Figures \ref{tsa-wz30} and \ref{tsa-wz24} illustrate a typical contrast
of arctic data with data from temperate or tropic regions.
The arctic water-bottom reflection is generally hard,
indicating that the bottom is in a constant state of erosion
from the scraping of the ice flows and the carrying away of
sediments by the bottom currents,
unlike temperate and tropical climates where
the bottom is often covered with soft sediments,
the top layer being unconsolidated mud,
and deeper layers being mud consolidated into mudstone.

\par
Now we devise a simple mathematical model for the multiple reflections
in Figure~\ref{tsa-wz24} and~\ref{tsa-wz30}.
There are two unknown waveforms,
the source waveform $S(Z)$,
and the ocean-floor reflection $F(Z)$.
The water-bottom primary reflection $P(Z)$
is the convolution of the source waveform
with the water-bottom response so $P(Z)=S(Z)F(Z)$.
The first multiple reflection $M(Z)$ sees the same source waveform,
the ocean floor, a minus one for the free surface, and the ocean floor again.
So the observations $P(Z)$ and $M(Z)$
as functions of the physical parameters are
\begin{eqnarray}
P(Z)&=&S(Z)\,F(Z)      \label{tsa.PP} \\
M(Z)&=&-S(Z)\,F(Z)^2   \label{tsa.MM}
\end{eqnarray}
In Figure~\ref{tsa-wz24} it appears $F(Z)$ is nearly an impulse
whereas Figure~\ref{tsa-wz30} is dominated by the nonimpulsive aspect of $F(Z)$.
Algebraically the solutions of equations
(\ref{tsa.PP}) and
(\ref{tsa.MM}) are
\begin{eqnarray}
F(Z)&=&M(Z)/P(Z)   \label{tsa.FF} \\
S(Z)&=&P(Z)^2/M(Z) \label{tsa.SS}
\end{eqnarray}

\par
These solutions can be computed in the Fourier domain as described
earlier with Figure~\ref{ls-harlan2}.
Alternately the method of least squares can be used.
The Fourier method seems slightly clearer.
But a disadvantage of the Fourier method is that the answers
for $S(Z)$ and $F(Z)$ could turn out to be unrealistically
long in the time domain.
The Fourier method would probably need to be modified
by spectral smoothing to shorten the response functions.
(The relation of spectral roughness to signal duration
is illustrated in chapter \RAND.)
The onset and duration of the filters $S(Z)$ and $F(Z)$
are much more readily controlled by a time domain method.
The time-domain solutions are called ``shaping filters.''
\par
To use least squares we begin by rewriting
(\ref{tsa.FF}) and
(\ref{tsa.SS}) as
\begin{eqnarray}
P(Z)F(Z)&\approx &M(Z)	  \label{tsa.floor} \\
M(Z)S(Z)&\approx &P(Z)^2  \label{tsa.source}
\end{eqnarray}
The unknowns $F(Z)$ and $S(Z)$ are readily constrained to be
in any time window by expressing either
(\ref{tsa.floor}) or
(\ref{tsa.source})
as a transient convolution like equation~(\ref{conj.contran}).
We can easily use the method of conjugate gradients to solve
each system by employing the program
{\tt contran()}
on page~\pageref{contran}.
The conjugate gradient program {\tt cgmeth()}
on page~\pageref{cgmeth} needs only to have
the matrix multiply replaced by {\tt contran()}
as we did with the missing data program {\tt miss1()}
on page~\pageref{miss1}.
Thus
\progblock{shaper}{\PROG/shaper.rt}

\par
The goal of finding the filters $F(Z)$ and $S(Z)$ is to
best model the multiple reflections so they can
be subtracted from the data,
enabling us to see what primary reflections
have been hidden by the multiples.
Typical data includes not only that shown,
but also wider source-receiver separation
as well as many other nearby shots and their receivers.
Corrections need to be done for hyperbolic travel time
resulting from lateral separation between shot and receiver.
Diffractions are a result of lateral imperfections in the
generally flat sea-floor.
These spatial aspects of this topic are considered at great length
in my earlier book ``\IEI''
and we will investigate them here in a limited way.


\subsection{Shaping a ghost to a spike}
An exasperating problem in seismology is the ``ghost'' problem.
In this problem, a waveform is replicated a short moment later
because of a strong nearby reflection.
In marine seismology the nearby reflector is the sea surface.
It is near to both the sound gun and the hydrophones
thus giving two ghosts.
Upgoing waves and downgoing waves at the sea surface
have opposite polarity because their pressures combine
to zero at the surface.
So waves seen in the hydrophone encounter
the ghost operator $g_t=(1,0,0, \cdots ,-1)$ twice,
once for the surface near the source and once for that near the hydrophone.
The number of zeros is typically small depending on the depth of the device.
The sound receivers can be kept away from surface water wave noise
by positioning them deeper,
but that extends the ghost delay,
and as we'll see,
this ghost is really hard to eliminate by processing.
Suppose the ghost is $G(Z)=1-Z^2$.
Theoretically the inverse is of infinite duration,
namely $(1,0,1,0,1,0,1,0,1, \cdots)$.

\par
Since an infinitely long operator is not satisfactory,
I used the program {\tt shaper()} above
to solve a least-squares problem for an antighost operator of finite duration.
Since you know the least-{\it squares} method abhors large errors
and tends to equalize the errors,
you should be able to guess the result.
The filter $(.9,0,.8,0,.7,0,.6,0,.5,0,.4,0,.3,0,.2,0,.1)$
when convolved with $(1,0,-1)$ produces equal squared errors
of $.01$ at each output time.

\par
The poor result can be described in the Fourier domain
by the many zeros in the spectrum of $(1,0,-1)$.
Since you can't divide by zero, you shouldn't try to divide
by $1-Z^n$ which has zeros uniformly distributed on the unit circle.
The method of least squares prevents disaster, but it cannot perform magic.

\par
I don't know whether you consider this result satisfactory,
but I consider it a problem in search of a different solution.
This problem also arises with seismograms
recorded in a shallow borehole.
As mentioned,
the total problem generally includes many waveforms
propagating in more directions
so the whole problem is not so one dimensional
as it may appear from Figures~\ref{tsa-wz30}
and \ref{tsa-wz24} in which I did not display the wide-offset signals.


\exer
\item
The velocity of sound in water is about 1485 meters/sec.
How many zeros are in the ghost filter for geophones
at 10 meters depth with data sampled at 2 milliseconds?
\item
Define the inputs to subroutine {\tt shaper()} that gave
the filter 
$(.9,0,.8, \cdots .1)$
mentioned above.
\exerend


\section{SYNTHETIC DATA FROM FILTERED NOISE}

A basic model for describing the random character of signals
is to model them by putting random numbers into a filter.
Practical work often consists of the reverse,
deducing the filter and deconvolving it to see the input.

\subsection{Gaussian signals versus sparse signals}
Most theoretical work is based on random numbers from a Gaussian
probability function.
The basic theoretical model is that
at {\it every} time point
a Gaussian random number is produced.
In real life we observe such signals,
but we also observe signals with a sparser collection of bursts.
These bursty signals can be described in either of two ways,
(1) either many points have zero excitation
(or an excitation that is smaller than expected from a Gaussian) or
(2) the Gaussian probability
function describes the many smaller values,
but some larger values are noticeably present.

\par
It turns out the Gaussian probability function
makes the most cryptic signals.
It also turns out that theory is best developed for the Gaussian case.
Thus Gaussian theory, which is the most pessimistic,
tends to be applied to both Gaussian and sparser data.
Sparse signals derive from a diverse collection of models,
but usually there isn't enough information to establish a convincing model.
In practical work, ``nongaussian'' generally means ``sparser than Gaussian.''
\plot{tsa-spikes}{2.26in}{\FIGS/tsa/spikes}{ (tsa-spikes) 
	Left is random numbers from a Gaussian probability function.
	(The random numbers are connected by lines.)
	Right, the random numbers are cubed making
	a signal in which large spikes are sparser.
	}
Figure~\ref{tsa-spikes}
illustrates random signals from a Gaussian probability function
and a sparser signal made by cubing the random numbers that
emerge from a Gaussian random number generator.

\subsection{Random numbers into a filter}
Figure~\ref{tsa-leaky} shows random numbers into a leaky integrator
and the resulting spectral amplitude.
\plot{tsa-leaky}{2.2in}{\FIGS/tsa/leaky}{ (tsa-leaky) 
	Left is sparse random noise passed through a leaky integrator.
	Right is the amplitude spectrum of the output.
	}
The output spectral amplitude of an integrator would be $|\omega|^{-1}$,
but the decay constant in the leaky integrator gives instead the amplitude
$(\omega^2+\epsilon^2)^{-1/2}$.
Since the random numbers are sparse,
you can see damped exponents in the data itself.
This enables you to confirm the direction of the time axis.
If the random numbers had been Gaussian,
the expected spectrum would be the same,
but you would not be able to see
the damped exponents nor detect the direction of time.


\subsection{Random numbers into the seismic spectral band}
Figure~\ref{tsa-band} shows synthetic data
designed to look like real seismic noise.
\plot{tsa-band}{2.2in}{\FIGS/tsa/band}{ (tsa-band) 
	Left is Gaussian random noise
	passed through Butterworth filters
	to simulate the seismic pass band.
	Right is the amplitude spectrum of the output.
	}
Here some Gaussian random numbers are
passed into a filter to simulate the seismic passband.
Two five-term Butterworth filters were used,
a highcut at .4 of the Nyquist and
a lowcut at .1 of the Nyquist.


\section{BLIND DECONVOLUTION}
Blind deconvolution presumes the ``white noise into a filter'' model
and proceeds to estimate the filter and deconvolve it.
Equivalently, an inverse filter is estimated and convolved.
The basic idea is simple but complications arise in the improvements.
Chapter \RAND\ shows that the spectrum of random noise is also noise,
but positive.
Smoothing it approximates the mathematical expectation operator,
and makes the spectrum tend to a constant.
Thus the spectrum of the output of random noise into a filter
is like the spectrum of the filter,
but it is jagged because of the noise.
This suggests smoothing that spectrum
and using it as the spectrum of the unknown filter.

\subsection{Blind deconvolution by Fourier methods}
\par
An estimate of the blind deconvolution filter
generally begins with the smoothed amplitude spectrum of the observations,
or equivalently with a tapered autocorrelation function.
The simplest guess is to inverse transform the spectrum to a time function.
The time-domain wavelet (found by inverse transforming the amplitude spectrum)
will be a symmetrical signal
but in real life the wavelet should be causal.
A causal filter can be created either by straightforward numerical
methods of this chapter, but it can also be made by more mathematical
methods in other chapters.
Chapter \SPEC\ shows a Fourier method,
called the Kolmogoroff method,
for finding a causal wavelet of a given spectrum.
Chapter \RAND\ shows that the length of the Kolmogoroff wavelet
depends on the amount of spectral smoothing which in this
chapter is like the ratio of the data length to the filter length.

\par
In the blind deconvolution problem,
Fourier methods determine the spectrum of the unknown wavelet,
but seem to be unable to determine its phase by measurements,
only to assert it by theory.
This is a limitation of the ``stationarity'' assumption,
that signal strengths are uniform in time.
Where signal strengths are nonuniform,
better results can be found with weighting functions and time domain methods.
In Figure~\ref{tsa-dallpass} we'll see
that the all-pass filter again becomes visible
when we take the trouble to apply appropriate weights.

\subsection{The error filter family}
There are a wide variety of least-squares filters.
The most important one is the prediction-error (PE) filter.
A PE filter has the form $(1, a_1,a_2,a_3, \cdots ,a_n)$
where the coefficients $a_j$ are designed
by a least squares procedure to have minimum output power.
The filter $(-a_1,-a_2,-a_3, \cdots , -a_n)$ is a prediction filter
for the point off its end.
The prediction-error filter is the time-domain counterpart
to the Kolmogoroff wavelet of chapter \SPEC.

\par
A variant form of the PE is the gapped PE filter,
which for a gap of two, takes the form
$(1, 0, 0, a_3,a_4,\cdots ,a_n)$.
Another filter that we'll encounter is the interpolation-error filter.
Its coefficients are
$(a_{-m}, \cdots, a_{-2}, a_{-1}, 1, a_1, a_2, a_3, \cdots ,a_n)$.
It can be gapped too.
The program will be discussed after a collection of examples.


\subsubsection{Prediction-error (PE) filters on synthetic data}

The idea of gapping a prediction is to moderate the goal of converting
realistic signals into perfect impulses.
Figure~\ref{tsa-dleak} shows synthetic data,
\plot{tsa-dleak}{1.8in}{\FIGS/tsa/dleak}{ (tsa-dleak) 
	Deconvolution of leaky integrator signals with
	PE filters of various prediction-gap sizes.
	Inputs and outputs on alternate traces.
	Gap size increases from left to right.
	}
sparse noise into a leaky integrator,
and deconvolutions with prediction-error filters.
Theoretically the filters should turn out to be
$1-.9Z^{\rm gap}$.
The result is quite successful with varying degrees of success
being seen by the filters obtained on the varying traces.

\par
To see what happens when an unrealistic deconvolution goal
is set for prediction error,
we could try to compress a wavelet
that is resistant to compression,
for example the impulse response of a Butterworth bandpass filter.
The perfect filter to compress any wavelet is its inverse.
But the Butterworth filter has a wide region of its spectrum
where it is nearly zero
so any presumed inverse to it
must be nearly dividing by that range of zeros.
Compressing a Butterworth filter is so tough
that I omit the random numbers used in Figure~\ref{tsa-dleak}
and apply prediction error to the Butterworth response itself,
in Figure~\ref{tsa-dbutter}.
\sideplot{tsa-dbutter}{1.92in}{\FIGS/tsa/dbutter}{ (tsa-dbutter) 
	Butterworth deconvolution by prediction error.
	}
So we have seen that gapped PE filters sometimes are able to compress a wavelet,
and sometimes not.
In real life resonances arise in the earth's shallow layers and as we'll see,
the resonant filters can be shortened by PE filters.


\subsubsection{PEF on field data}

\par
Figure~\ref{tsa-wz14} is a nice example illustrating the utility
of prediction-error filters.
\plot{tsa-wz14}{2.7in}{\FIGS/tsa/wz14}{ (tsa-wz14) 
	Data from offshore Texas.
	(extracted from Yilmaz and Cumro dataset 14).
	Wiggle display above and raster below.
	Inputs above outputs.
	Filters displayed on the right.
	}
The input is quasi-sinusoidal which indicates that
PE filtering should be successful.
Indeed some events are uncovered
that would not likely have been identified on the input.
In this figure, a separate problem is solved for each trace
and the resulting filter is shown on the right.


\subsection{The prediction-error filter output is white}

\par
The most important property of a PE filter is that
its output tends to a white spectrum.
No matter what the input of this filter,
the output tends to whiteness as the number of the coefficients
$n \rightarrow \infty$ tends to infinity.
Thus, the prediction-error filter adapts itself to the input
by absorbing all its color.
If the input is already white,
the $a_j$ coefficients vanish---the PE filter is frustrated
because with a white input it can predict nothing
so the output is the same as the input.
Thus, if you were to cascade one PE filter after another,
you would find only the first filter does anything.
If the input is a sinusoid,
it is exactly predictable by a three-term recurrence relation,
so then all the color is absorbed by a three-term PE filter (see exercises).
The power of a PE filter is that a short filter can often extinguish,
and thereby represent the information, in a long filter.

\par
It is a useful practical feature that the output of a PE filter is white.
Imagine the reverberation of the soil layer,
highly variable from place to place,
as the resonance between the surface and deeper consolidated rocks
varies rapidly with surface location
as a result of geologically recent fluvial activity.
The spectral color of this erratic variation on surface-recorded seismograms
is compensated by a PE filter.
Of course, it is not desired that PE filtered seismograms be white,
but once they all have same spectrum,
it is easy to postfilter them to any desired spectrum.

\par
Since the PE filter has an output spectrum that is white,
it means that the PE filter itself has a spectrum that
is inverse to the input.
Indeed, an effective mechanism of spectral estimation
developed by John P. Burg and described in chapter \TOE,
is to compute a PE filter and look at the inverse of its spectrum.

\par
Another interesting property of the PE filter is that it is minimum phase.
The best proofs of this property are found in my earlier book, ``\FGDP.''
The proofs assume uniform weighting functions.

\subsection{Non whiteness of gapped PEF output}
When a PE filter is constrained to have
a few near-zero-lag coefficients be zero,
the output no longer tends to be white
as the number of coefficients in the filter tends to infinity.
If $f_1$, the filter coefficient
of $Z = e^{i \omega \Delta t}$, vanishes
then $F(\omega)$ lacks the slow variation in $\omega$ that this term provides.
It lacks just the kind of spectral variation
that could boost weak near-Nyquist noises
up to the strength of the main pass band.
With such variation absent by the constraint,
the growth of Nyquist region energy
is no longer a necessary byproduct of PE filtering.

\par
Figure~\ref{tsa-wz27} illustrates a PE filter with a long gap.
\plot{tsa-wz27}{2.63in}{\FIGS/tsa/wz27}{ (tsa-wz27) 
	Data from offshore Canada
	(extracted from Yilmaz and Cumro dataset 27)
	processed by gapped prediction error.
	Inputs above outputs.
	Filters displayed on the right.
	Nicely suppressed multiples in boxes.
	Badly suppressed multiples above diagonal lines.
	}
(The gap was chosen to be a little less than the water depth).
This example nicely shows the suppression of some multiple reflections,
but unfortunately I don't see any primary reflections
that have been uncovered.
Because the prediction gap is so long,
the filter causes no visible change to the overall spectrum.
Notice how much more the spectrum was broadened
by the shorter-gapped filter in Figure~\ref{tsa-wz14}.
The theoretical association of prediction gap width with spectral broadening
is examined next.

Another interesting feature of Figure~\ref{tsa-wz27}
that we will investigate later is a geometrical effect.
This shows up as poor multiple removal on and above the diagonal lines.
This happens because of the non-zero separation
of the sound source and receiver.


\subsection{Proof that the output of a PE filter is white}
\label{'white_proof'}
The proof is lengthy but it is worthwhile because
it reviews many basic concepts in chapter \CONV\ and \ZZ.
You can omit the proof below
which probably belongs in a theory chapter.

\par
The basic idea of least-squares fitting
is that the residual is orthogonal to the fitting functions.
Applied to the PE filter this idea means
that the output of the PE filter is orthogonal to lagged inputs.
The orthogonality applies only for lags in the past because
prediction knows only the past while it aims to the future.
What we want to show is slightly different,
namely that the output is uncorrelated with {\it itself}
(not with the input) for lags in {\it both} directions.

\par
By way of background, first consider two filters,
minimum-phase and mutually inverse
$A(Z)B(Z)=1$ so $A=1/B$ and 
$\overline{A} = 1/\overline{B}$.
Their spectra are also inverses so
$\overline{A}\overline{B} B A = 1$.
We will later recognize $BA$ as a white PE output.
Dividing $\overline{A}\overline{B}BA$ through by $\overline{A}$ gives
\begin{equation}
[\overline{B}(1/Z)] [B(Z) A(Z)] \eq
{ 1 \over \overline{A}(1/Z)} \eq
\overline{B}(1/Z) \eq b_0
+b_1Z^{-1}
+b_2Z^{-2}
+ \cdots
\label{tsa.oneside}
\end{equation}
whose coefficients vanish for positive powers of $Z$
but not for negative powers.
We'll come to associate $[\overline{B}(1/Z)]$ with
lagged input and $[B(Z)A(Z)]$ with PEF output.
The right side of~(\ref{tsa.oneside}) shows
they are uncorrelated in one sense of time,
but not the other.

\par
For the proof we take a signal $X(Z)$
that by hypothesis is the output of white noise into
an unknown filter $B(Z)$.
The calculation begins from the autocorrelation of $x_t$
which is expected to be the autocorrelation of $b_t$,
i.e.~$\E [\overline{X}X] = \overline{B} B$.
(More about statistical expectation ``\E'' is found in chapter \RAND).
Since any spectrum has a minimum-phase factorization,
the given information is unchanged if $B(Z)$ is minimum phase.
We define the PEF by minimizing the power in $X(Z)A(Z)$.
This power is the coefficient of $Z^0$ of the expression
the spectrum
$$
{\rm power} \eq {\rm coef~of~}  Z^0 {\rm ~in~}
\overline{A}(1/Z) \overline{B}(1/Z) B(Z) A(Z)
$$
Setting to zero the derivative with each coefficient in $\overline{A}$
(expanded details in chapter \TOE) gives
the zero coefficients on the positive powers of $Z$
in equation~(\ref{tsa.oneside}).
To show that the output $X(Z)A(Z)$ is uncorrelated with itself,
we take equation~(\ref{tsa.oneside}) and replace $\overline{B} B$ by
$\E [\overline{X} X]$ and then multiply by $A(1/Z)$ and
we get the white-output statement that
$[A(1/Z)X(1/Z)][X(Z)A(Z)]=1$.

\subsection{Postcoloring versus Prewhitening}
The output of a PE filter is white (unless it is gapped).
But people don't like to look at white signals.
Signals are normally sampled adequately densely,
which means that they are small anywhere near the Nyquist frequency.
There is rarely energy above the half Nyquist
and generally little but marine noises above quarter Nyquist.
To avoid boosting these noises,
the ungapped PE filter is generally altered
or accompanied by other filters.
Three common approaches are:
\begin{itemize}
\item Use a gapped filter.
\item Deconvolve, then apply a filter with the desired spectrum $S(\omega)$.
\item Prefilter the input with $S(\omega)^{-1}$,
then deconvolve with an ungapped PEF, and finally postfilter with $S(\omega)$.
\end{itemize}
The last process is called ``prewhitening'' for some complicated reasons
that motivate the name.
The idea seems to be
that the prefilter removes known color
so that the least squares coefficients are not ``wasted''
predicting what is already known.
So theoretically the prefilter spectrum $S(\omega)^{-1}$
is the inverse of the prior estimate of the input spectrum.
In real life, that is merely an average of estimates from other data.
If the desired output spectrum does not happen to be $S(\omega)$,
it doesn't matter since any final display filter can be used.
Although this is a nice idea,
I have no example to illustrate it.

\par
There is also the question of what phase the postfilter should have.
Here are some cautions against the obvious two choices:
\begin{itemize}
\item symmetrical. A symmetrical filter has a noncausal response.
\item causal.
If a later step of processing is to make
a coherency analysis for velocity versus time,
then the effective time will be more like the signal maximum
than the first break.
\end{itemize}
Since the postfilter is broad band,
its phase is not so important
as that of the deconvolution operator
which tries to undo the phase of a causal and resonant earth.


\section{WEIGHTED ERROR FILTER CALCULATION}
Here we'll look at the motivation for introducing weighting functions
into filter analysis and see how to write the CG
program that produced most of the examples in this chapter.

\subsection{AGC (automatic gain control)}
\label{'AGC'}
Echos get weaker with time, though the information content
is about independent of the signal strength.
Echos also vary in strength as different materials are encountered
by the outgoing wave.
Programs for echo analysis typically divide the data by a scaling factor
that is a smoothed average of the signal strength.
This is a practice that is near universal,
though it is fraught with hazards.
An example of an AGC would be to compute the divisor by
forming the absolute value of the signal strength and then
smoothing with
the program {\tt triangle()} on page~\pageref{triangle} or
the program {\tt leaky2()} on page~\pageref{leaky2}.
Pitfalls are the strange amplitude behavior surrounding the water bottom,
and the overall loss of information contained in amplitudes.
Personally I have found that the gain function $t^2$ nearly always
eliminates the need for AGC,
but I have no doubt that AGC is occasionally needed.
(A theoretical explanation for $t^2$ is found in my other book ``\IEI.'')


\subsection{Gain before or after convolution}
It is common practice to apply AGC to echo soundings before
filter analysis.
This is a questionable practice.
The proper practice is to analyze first according to the laws of physics
and only at the last stage to apply gain functions
for purposes of statistical estimation and final display.
Here we'll examine correct and approximate ways
of setting up deconvolution problems with gain functions.
Then we'll use CG to solve the proper formulation.

\par
Solving problems in the time domain offers
an advantage over the frequency domain
because in the time domain it is easy to control
the interval where the solution should exist.
Another advantage of the time domain arises when
weighting functions are appropriate.
I've noticed that people sometimes use Fourier solutions
thereby forcing themselves to use uniform weighting
when another weighting would be more appropriate.
Since we look at echos it is unavoidable that we apply gain functions.
Weighting is always justified on the process {\it outputs},
but it is an approximation of unknown validity on
the data that is {\it input} to those processes.
I'll clarify this approximation by an equation
with two filter points and an output of three time points.
In real life application the output is typically 1-2 thousand points
and the filter 5-50 points.
The valid formulation of a filtering problem is
\begin{equation}
\left[ 
  \begin{array}{c}
    0 \\
    0 \\
    0
  \end{array}
 \right] 
\quad \approx \quad
  \left[ 
    \begin{array}{ccc}
	  w_1 &   0 &   0  \\
	  0   & w_2 &   0  \\
	  0   &   0 & w_3 
    \end{array}
  \right] 
  \left(
	\left[ 
	  \begin{array}{c}
	    d_1 \\
	    d_2 \\
	    d_3
	  \end{array}
	 \right]  \ - \ 
	\left[ 
	    \begin{array}{cc}
		  b_1 &   0   \\
		  b_2 & b_1   \\
		  0   & b_2  
	    \end{array}
	\right] 
	\ 
	\left[ 
	  \begin{array}{c}
	    f_1 \\
	    f_2
	  \end{array}
	 \right] 
  \right)
\label{tsa.weighted}
\end{equation}
The weights $w_t$ are any positive numbers you choose,
typically the $w_t$ are chosen so the residual components
are about equal in magnitude.
\par
If instead the weighting function is applied to the inputs
we have an approximation that is something somewhat different:
\begin{equation}
	\left[ 
	  \begin{array}{c}
	    0 \\
	    0 \\
	    0
	  \end{array}
	 \right] 
	 \quad \approx \quad
	\left[ 
	  \begin{array}{c}
	    w_1 d_1 \\
	    w_2 d_2 \\
	    w_3 d_3
	  \end{array}
	 \right] 
      \ - \ 
	\left[ 
	    \begin{array}{cc}
		  w_1 b_1 &   0   \\
		  w_2 b_2 & w_1 b_1   \\
		  0       & w_2 b_2  
	    \end{array}
	\right] 
	\ 
	\left[ 
	  \begin{array}{c}
	    f_1 \\
	    f_2
	  \end{array}
	 \right] 
\label{tsa.kludge}
\end{equation}
Comparing the weighted output-residual equation~(\ref{tsa.weighted}) to
the weighted input data equation~(\ref{tsa.kludge}) we note
the right column of the matrix in~(\ref{tsa.kludge})
does not match that in~(\ref{tsa.weighted}).
If the right hand column
in~(\ref{tsa.kludge})
had been defined as $(0, w_2 b_1, w_3 b_2)'$
instead of $(0, w_1 b_1, w_2 b_2)'$
then the equations would match,
but then the matrix
in~(\ref{tsa.kludge})
would not be a simple convolution
and some fast solution methods would not apply.
(Such methods are both Fourier and Toeplitz, to be described
in chapter \TOE).
The approximation~(\ref{tsa.kludge}) becomes reasonable
when the weights are slowly variable, i.e. $w_t$ is a slowly variable
function of $t$.
In practice, I think the approximation is often justified
for slow $t^2$ gain
but questionable for automatic gains that are faster.
The approximation (\ref{tsa.kludge}) enables use of Fourier methods,
because the problem still retains an entirely convolutional form.
Compared to Toeplitz methods of solving equation~(\ref{tsa.kludge}),
the CG method of solving~(\ref{tsa.weighted}) is slower.
The Toeplitz cost is proportional to the number of unknowns
instead of its square.
Here we are going to see how to solve the problem correctly.
If you want to solve the correct problem rapidly,
perhaps you can do so by solving the approximate problem
first by a quasi-analytic method,
and then do a few of steps of CG.

\subsection{Setting up any weighted CG program}
Equation~\ref{tsa.weighted} is of the form
$\bold 0 \approx \bold W (\bold d - \bold B\bold f)$.
This can be converted to a new problem without weights
by defining a new data vector
$\bold W\bold d$
and a new operator
$\bold W\bold B$
simply by carrying $\bold W$ through the parentheses to
$\bold 0 \approx \bold W \bold d - (\bold W \bold B)\bold f$.
Thus, you can use the same program {\tt shaper()} that we used for
unweighted least squares.
Alternately we can keep the old data and old operator
and put the weighting function
in the CG program as below.
You will notice that the convolution program {\tt contrunc(0,...)} has been
replaced by itself followed by a diagonal weighting.
Likewise, since the conjugate operator of $\bold W\bold B$ is
$\bold B' \bold W'$ we see {\tt contrunc(1,...)} replaced by itself
{\it preceded} by the diagonal weighting.


\subsection{Calculating error filters}

The {\it error} in prediction (or interpolation) is often more interesting
than the prediction itself.
When the predicted component is removed,
leaving the unpredicatable, the residual is called the prediction error.
Let us see how the program {\tt shaper()} can be used
to find an interpolation error filter like $(f_{-2},f_{-1},1,f_1,f_2)$.
The statement of wishes is:
\begin{equation}
\left[ 
  \begin{array}{c}
    0   \\
    0   \\
    0   \\
    0   \\
    0   \\
    0   \\
    0   \\
    0   \\
    0   \\
    0 
  \end{array}
 \right]
\quad \approx \quad
\left[ 
  \begin{array}{c}
    .   \\
    .   \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    .   \\
    . 
  \end{array}
 \right]
\ \  +\ \ 
	\left[ 
	    \begin{array}{cccc}
		  x_1 &  .  &  .  &  .   \\
		  x_2 & x_1 &  .  &  .   \\
		  x_3 & x_2 &  .  &  .   \\
		  x_4 & x_3 & x_1 &  .   \\
		  x_5 & x_4 & x_2 & x_1  \\
		  x_6 & x_5 & x_3 & x_2  \\
		   .  & x_6 & x_4 & x_3  \\
		   .  &  .  & x_5 & x_4  \\
		   .  &  .  & x_6 & x_5  \\
		   .  &  .  &  .  & x_6
	    \end{array}
	\right] 
	\ 
	\left[ 
	  \begin{array}{c}
	    f_{-2} \\
	    f_{-1} \\
	    f_1    \\
	    f_2
	  \end{array}
	 \right] 
\label{tsa.almost}
\end{equation}
Changing the polarity of all the matrix elements and bringing
the matrix to the other side of the equation
gives us the standard form
of overdetermined simultaneous equations.
After solving this system for
$(f_{-2},f_{-1},f_1,f_2)$
we insert the ``1'' to make the IE filter
$(f_{-2},f_{-1},1,f_1,f_2)$ which applied to the data $x_t$
gives the desired IE output.

\par
Notice that the matrix in (\ref{tsa.almost})
is {\it almost} convolution.
It would be convolution except the central column is absent.
I propose you not actually solve the system~(\ref{tsa.almost}),
instead I'll show you a more general solution
that uses the convolution operator itself.
That way you won't need to write programs
for the many ``almost'' convolution operators arising
from the many PE and IE filters with various gaps and lags.

\par
The conjugate-gradient program is a combination
of earlier CG programs and the weighting methods we must introduce here.
\todo{Want to introduce weighting in CG long before IE filtering}

\begin{itemize}
\item
We need to constrain a filter coefficient to be unity,
which we can do by initializing it to be unity and then
allowing no changes to it.
\item
We may wish to constrain some other filter coefficients to be zero (gapping)
by initializing them to zero and allowing no changes to them.
\item
The polarity convention change for the unknowns
can be put into the operator
by replacing it with a product of itself with a sign-change operator.
\item
We may want the output to occur someplace other than off-end prediction.
Thus we will specify a time lag that denotes
the predicted or interpolated time point.
The program {\tt contrunc()} is designed for this.
\end{itemize}
Incorporating all these features into {\tt shaper()} we get {\tt iner()}.
\progblock{iner}{\PROG/iner.rt}
For a filter of the form $(1, f_1,f_2, \cdots ,f_{n-1})$ you would specify
{\tt lag=1, gap1=1, gapn=1}.
For a filter of the form $(1, 0, f_2, \cdots ,f_{n-1})$ you would specify
{\tt lag=1, gap1=1, gapn=2}.
For a filter of the form $(f_{-2},f_{-1},1,f_1,f_2)$ you would specify
{\tt nf=5, lag=3, gap1=3, gapn=3}.

\par
This program uses the convolution program
{\tt contrunc()}
which is handy in practice because its output has the same length as its input.
This convenience is partly offset by a small danger
that significant output energy in the
``start up'' and ``off end'' zones could be truncated.
In equation~(\ref{tsa.almost}) the routine {\tt contrunc()} drops
some top and/or bottom rows.


\subsection{Stabilizing technique}
Earlier some theory for stabilizing least squares was described
with equations
(\ref{uni.stabfilt}) and
(\ref{ls.knownfilt}).
I installed this stabilization along with the filter determinations
in this chapter,
but as I expected,
stabilization in this highly overdetermined application
showed no advantages.
Never-the-less, it is worth seeing how this is done,
particularly since the changes to the program make for more
informative plots.
The input data is modified by appending a zero padded impulse
at the end.
The output will contain the filter impulse response in that region.
The spike size is chosen to be compatible with the data size,
for convenience of the plotting programs.
The weighting function in the appended region
is scaled according to how much stabilization is desired.
Figure~\ref{tsa-wz33} shows the complete input and residual.
\plot{tsa-wz33}{1.38in}{\FIGS/tsa/wz33}{ (tsa-wz33) 
	Data from the North Sea 
	(extracted from Yilmaz and Cumro dataset 33)
	processed by prediction error.
	Rightmost box is weighted according to desired stabilization.
	The truncation event is weighted by zero.
	}
It also shows the problem that output data flows beyond the length
of the input data because of the nonzero length of the filter.
This extra output is undoubtedly affected by the truncation of
the data, and its energy should not be part of the energy minimization.
So it is weighted by zero.

\exer
\item
Given a sinusoidal function $x_t = \cos(\omega t + \phi)$,
find a three-term recurrence relationship
that predicts $x_t$ from the previous two points,
namely $x_t = a_1 x_{t-1} + a_2 x_{t-2}$.
Hence deduce that a sinusoid is perfectly predictable
given two successive past values.
The coefficients depend on $\omega$ but not $\phi$.
\item
Figure~\ref{tsa-wz14} has a separate filter for each trace.
Consider the problem of finding a single filter for all the traces.
What is the basic operator and its conjugate?
Assemble these operators using subroutine {\tt contrunc()} on
page \pageref{contrunc}.
\item
Examine the filters on Figure~\ref{tsa-wz33}.
Notice besides the pulse at the water depth
is another weak one at double that depth.
Suggest a physical mechanism.
Suggest a mechanism relating to computational approximations.
\exerend


\section{INTERPOLATION ERROR}
An interpolation-error (IE) filter has the form
$(a_{-m}, \cdots, a_{-2}, a_{-1}, 1, a_1, a_2, a_3, \cdots ,a_n)$
where the $a_t$ coefficients are adjusted to minimize the power
in the filter output.
This filter has the strange character that if the input spectrum
is $S(\omega)$,
then the output spectrum will tend to
$S(\omega)^{-1}$.
Thus the IE filter tends to turn poles into zeros and vice versa.
To see why the IE filter inverts the spectrum of the input,
we only need recall the basic premise of least squares methods,
that the residual (the output) is orthogonal to the fitting function
(the input at all lags except the zero lag).
Thus the crosscorrelation of the input and the output is an impulse.
This can only happen if their spectra are inverses.
This is a disaster for the overall appearance of a seismogram.
Such drastic spectral change can be controlled in a variety of ways
as with PE filters,
but here there seems to be little experience besides my own.
Figure~\ref{tsa-wz33ie} illustrates an interpolation-error result
where gapping has been used to limit the color changes.
\plot{tsa-wz33ie}{2.55in}{\FIGS/tsa/wz33ie}{ (tsa-wz33ie) 
	Data from the North Sea
	(extracted from Yilmaz and Cumro dataset 33)
	processed by interpolation error.
	Inputs above outputs.
	Filters displayed on the right.
	}
I also chose the gap to try to condense the wavelet.
Your judgement is needed to see if the result is successful.
Notice also a high frequency arrival after the diagonal lines.
This shows the IE filters are boosting very high frequencies
despite the gapping.

\par
I am fascinated by the theoretical implications of IE filters.
This is not the place to expound on my research in this area.
But, recall that least-squares problems should be formulated
with an inverse-covariance weighting to the residuals,
and in signal and image analysis this means
the expectation of the residuals should
be white, i.e.~have an impulsive {\it autocorrelation.}
What I have noticed is that
the product of the output of an IE filter with its input
has an impulsive {\it crosscorrelation}
and I am trying to exploit this to make high quality estimates.
(Look back to Figure~\ref{uni-filt} and
imagine the bottom panels being the output of a 2-D IE filter,
and imagine
instead of using dot products on the images,
using crossproducts between upper and lower images.)
Another fact about IE filters is that they tend to be symmetrical functions.

\subsection{Blind all-pass decon}
A well established theoretical concept
that leads to unwarranted pessimism is the idea
that blind deconvolution cannot find an all-pass filter.
If we carefully examine the analysis leading to that conclusion
we will find lurking the assumption that the weighting function
used in the least-squares estimation is uniform.
And when this assumption is wrong,
so is the conclusion,
as Figure~\ref{tsa-dallpass} shows.
\plot{tsa-dallpass}{1.9in}{\FIGS/tsa/dallpass}{ (tsa-dallpass) 
	Four independent trials of
	deconvolution of sparse noise into an all-pass filter.
	Alternate lines are input and output.
	}
Recall that the inverse to an all-pass filter is its time reverse.
The reversed shape of the filter
is seen for the inputs where there happen to be isolated spikes.

\par
Let us see what theory predicts can't be done,
and then I'll tell you how I did it.
If you examine the unweighted least-squares error filter programs
you notice the first calculation is the
the convolution operator and then its transpose.
This takes the autocorrelation of the input
and uses it as a gradient search direction.
Take a white input and pass it through a phase-shift filter.
The autocorrelation of the output should be an impulse function.
Since the impulse itself is constrained against rescaling,
the effective gradient is zero, and the solution,
an impulse filter, is already at hand,
and so a phase-shift filter seems unfindable.
\par
On the other hand, if the signal strength of the input varies,
it means we should be balancing its expectation by weighting functions.
This is what I did in Figure~\ref{tsa-dallpass}.
I chose a weighting function equal the inverse of the absolute value
of the output of the filter plus an $\epsilon$.
Since the weighting function depends on the output,
the process is iterative.
The value of $\epsilon$ chosen was 20\% of the maximum signal value.
\par
Since the iteration is a nonlinear procedure,
it might not always work.
There is a well-established body of theory that says
this won't work with Gaussian signals,
and Figure~\ref{tsa-dgauss} is consistent with that theory.
\plot{tsa-dgauss}{2.9in}{\FIGS/tsa/dgauss}{ (tsa-dgauss) 
	Failure of blind all-pass deconvolution
	for Gaussian signals.
	The top signal is based on Gaussian random numbers.
	Lower signals are based on successive integer powers
	of Gaussian signals.
	Filters (on the right) fail for the Gaussian case,
	but improve as signals become sparser.
	}

\par
Back in Figure~\ref{tsa-wz33ie}
I used weighting functions roughly inverse to the envelope
of the signal with a floor for the envelope being taken
at 20\% of the signal maximum.
Since weighting functions were used, the filters
need not have turned out symmetrical about their centers,
but the resulting asymmetry seems to be small.