Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Seismic field data examples

PEF ESTIMATION WITH MISSING DATA

If we are not careful, our calculation of the PEF could have the pitfall that it would try to use the missing data to find the PEF, and hence it would get the wrong PEF. To avoid this pitfall, imagine a PEF finder that uses weighted least squares where the weighting function vanishes on those fitting equations that involve missing data. The weighting would be unity elsewhere. Instead of weighting bad results by zero, we simply will not compute them. The residual there will be initialized to zero and never changed. Likewise for the adjoint, these components of the residual will never contribute to a gradient. So now we need a convolution program that produces no outputs where missing inputs would spoil it.

Recall there are two ways of writing convolution, equation () when we are interested in finding the filter inputs, and equation () when we are interested in finding the filter itself. We have already coded equation (), operator helicon . That operator was useful in missing data problems. Now we want to find a prediction-error filter so we need the other case, equation (), and we need to ignore the outputs that will be broken because of missing inputs. The operator module hconest does the job. hconesthelix convolution, adjoint is the filter

Now identify the broken regression equations, those that use missing data. Suppose that y₂ and y₃ were missing or bad data values in the fitting goal (27). That would spoil the 2nd, 3rd, 4th, and 5th fitting equations. Thus we would want to be sure that w₂, w₃, w₄ and w₅ were zero. (We'd still be left enough equations to find (a₂,a₃).)

$\begin{displaymath} \bold 0 \ \approx\ \bold W \bold r \ =\ \left[ \begin{arra... ... \begin{array} {c} 1 \\ a_1 \\ a_2 \end{array} \right]\end{displaymath}$ (27)

What algorithm will enable us to identify the regression equations that have become defective, now that y₂ and y₃ are missing? Examine this calculation:

$\begin{displaymath} \left[ \begin{array} {c} m_1 \\ m_2 \\ m_3 \\ m_4 \\ m... ...eft[ \begin{array} {c} 1 \\ 1 \\ 1 \end{array} \right]\end{displaymath}$ (28)

The value of m_i tells us how many inputs are missing from the calculation of the residual r_i. Where none are missing, we want unit weights w_i=1. Where any are missing, we want zero weights w_i=0.

From this example we recognize a general method for identifying defective regression equations and weighting them by zero: Prepare a vector like $\bold y$ with ones where data is missing and zeros where the data is known. Prepare a vector like $\bold a$ where all values are ones. These are the vectors we put in equation (28) to find the m_i and hence the needed weights w_i. It is all done in module misinput. misinputmark bad regression equations

Internal boundaries to multidimensional convolution
Finding the prediction-error filter

Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Seismic field data examples
Stanford Exploration Project
4/27/2004