next up previous print clean
Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Seismic field data examples

PEF ESTIMATION WITH MISSING DATA

If we are not careful, our calculation of the PEF could have the pitfall that it would try to use the missing data to find the PEF, and hence it would get the wrong PEF. To avoid this pitfall, imagine a PEF finder that uses weighted least squares where the weighting function vanishes on those fitting equations that involve missing data. The weighting would be unity elsewhere. Instead of weighting bad results by zero, we simply will not compute them. The residual there will be initialized to zero and never changed. Likewise for the adjoint, these components of the residual will never contribute to a gradient. So now we need a convolution program that produces no outputs where missing inputs would spoil it.

Recall there are two ways of writing convolution, equation ([*]) when we are interested in finding the filter inputs, and equation ([*]) when we are interested in finding the filter itself. We have already coded equation ([*]), operator helicon [*]. That operator was useful in missing data problems. Now we want to find a prediction-error filter so we need the other case, equation ([*]), and we need to ignore the outputs that will be broken because of missing inputs. The operator module hconest does the job. hconesthelix convolution, adjoint is the filter

Now identify the broken regression equations, those that use missing data. Suppose that y2 and y3 were missing or bad data values in the fitting goal (27). That would spoil the 2nd, 3rd, 4th, and 5th fitting equations. Thus we would want to be sure that w2, w3, w4 and w5 were zero. (We'd still be left enough equations to find (a2,a3).)  
 \begin{displaymath}
\bold 0
\ \approx\ \bold W \bold r \ =\ 
\left[
 \begin{arra...
 ... 
 \begin{array}
{c}
 1 \\  
 a_1 \\  
 a_2 \end{array} \right]\end{displaymath} (27)
What algorithm will enable us to identify the regression equations that have become defective, now that y2 and y3 are missing? Examine this calculation:  
 \begin{displaymath}
\left[
 \begin{array}
{c}
 m_1 \\  m_2 \\  m_3 \\  m_4 \\  m...
 ...eft[ 
 \begin{array}
{c}
 1 \\  
 1 \\  
 1 \end{array} \right]\end{displaymath} (28)
The value of mi tells us how many inputs are missing from the calculation of the residual ri. Where none are missing, we want unit weights wi=1. Where any are missing, we want zero weights wi=0.

From this example we recognize a general method for identifying defective regression equations and weighting them by zero: Prepare a vector like $\bold y$ with ones where data is missing and zeros where the data is known. Prepare a vector like $\bold a$ where all values are ones. These are the vectors we put in equation (28) to find the mi and hence the needed weights wi. It is all done in module misinput. misinputmark bad regression equations