Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Examples of modeling and

PEF ESTIMATION WITH MISSING DATA

If we are not careful, our calculation of the PEF could have the pitfall that it would try to use the missing data to find the PEF, and hence it would get the wrong PEF. To avoid this pitfall, imagine a PEF finder that uses weighted least squares where the weighting function vanishes on those fitting equations that involve missing data. The weighting would be unity elsewhere. Instead of weighting bad results by zero, we simply will not compute them. The residual there will be initialized to zero and never changed. Likewise for the adjoint, these components of the residual will never contribute to a gradient. Module hconest assumes it knows the bad locations (where outputs are dependent on unknown inputs) and ignores computation there.

module hconest {             # masked helix convolution, adjoint is the filter.
use helix
  real, dimension (:), pointer :: x
  type( filter)                :: aa
#% _init( x, aa)
#% _lop( a, y)
    integer  ia, ix, iy
    do ia = 1, size( a) {
         do iy = 1  + aa%lag( ia), size( y) {    if( aa%mis( iy)) cycle  
            ix = iy - aa%lag( ia)
             if( adj)    a( ia) +=  y( iy) * x( ix)
             else        y( iy) +=  a( ia) * x( ix)
         }
    }
}

Suppose that y₄ were a missing or a bad data value in the fitting goal (27). That would spoil the 4th, 5th and 6th fitting equations. Thus we would want to be sure that w₄, w₅, and w₆ were zero.

$\begin{displaymath} \bold 0 \ \approx\ \bold W \bold r \ =\ \left[ \begin{arra... ... \begin{array} {c} 1 \\ a_1 \\ a_2 \end{array} \right]\end{displaymath}$ (27)

When missing y_i are sprinkled around, which of the w_i should vanish? A simple solution (that we cannot use) is to put zeros in place of missing data and then take the product of every element in a row of the $\bold Y$ matrix. The product will vanish if any element in the row vanishes. We cannot use this solution because it would require us to have the matrix, when actually we have only an operator, a program that applies the matrix. A feasible solution uses complementary logic: We prepare a template like $\bold y$ with ones where data is missing and zeros where the data is known. We prepare a template like $\bold a$ where all values are ones. We put the templates themselves into (27) and apply the operator. Where the outputs are nonzero we have defective fitting equations and that is where we want the weights to be zero. It is all done in module misinput .

module misinput {	               # find a mask of missing filter inputs
  use helicon
contains
  subroutine find_mask( known, aa) {
    logical,  intent( in)          :: known(:)
    type( filter)                  :: aa
    real, dimension( size (known)) :: rr, dfre
    integer                        :: stat
    where( known) dfre = 0.    
    elsewhere     dfre = 1.    
    call helicon_init( aa)
    aa%flt = 1.
    stat = helicon_lop( .false., .false., dfre, rr)
    aa%flt = 0. 
    where ( rr > 0.)   aa%mis = .true.	
    }
}

EXERCISES:

The Galilee data set is a collection of tracks that can be put on a regular mesh using the linear interpolation operator $\bold 0\approx \bold L\bold m-\bold d$ along with some regularization. We observe that some of the tracks have different mean values than others which show up in our previous images. The tracks themselves are not clearly demarked in the data. We can eliminate the mean and suppress low frequency trends if we filter the residual with a simple low cut filter. The simplest thing is the derivative along the track. Call this $\partial\over\partial s$ .This gives the fitting goal $\bold 0\approx {\partial\over\partial s} (\bold L\bold m-\bold d)$ .Since $\partial\over\partial s$ is a filter, a one-dimensional gradient, we should not apply it to missing data. For this survey we have not so much missing data, but some erratic data: data outside the lake, water bottoms above the water surface, and simple bursts. Processing the depth z(s) with the gradient $\partial\over\partial s$ introduces new problems where we hit a spike or where one of the unmarked tracks ends and another begins. Fortunately, the original data has gone thru a preliminary examination identifying where many of the suspicious points lie. The original triples, (x_s, y_s, z_s) have been supplemented by a weighting function (x_s, y_s, z_s, w_s) that vanishes at the suspicious locations. (If I did this, I cannot find it now. The tool Suspicious.rst in jon/res/mad should be helpful.) What you are supposed to do is consider the suspicious points to be missing data values. Use knowledge of them with module misinput to design a filter, the one-dimensional gradient along the track. Choose suitable regularization. The result should be a track-free map of the lake. If the suspicious data points were all found by the pre-editor, the spikes should be gone too. Another description of this problem is at http://sepwww.stanford.edu/sep/jon/trash/mad/

Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Examples of modeling and

Stanford Exploration Project
12/26/2000