next up previous print clean
Next: SEABEAM: Filling the empty Up: Preconditioning Previous: INVERSE LINEAR INTERPOLATION

EMPTY BINS AND PRECONDITIONING

There are at least three ways to fill empty bins. Two require a roughening operator $\bold A$ while the third requires a smoothing operator which (for comparison purposes) we denote $\bold A^{-1}$.The three methods are generally equivalent though they differ in important details.

The original way in Chapter [*] is to restore missing data by ensuring that the restored data, after specified filtering, has minimum energy, say $\bold A\bold m\approx \bold 0$.Introduce the selection mask operator $\bold K$, a diagonal matrix with ones on the known data and zeros elsewhere (on the missing data). Thus $ \bold 0 \approx \bold A(\bold I-\bold K+\bold K)\bold m $ or  
 \begin{displaymath}
\bold 0 \quad\approx\quad
\bold A (\bold I-\bold K) \bold m
\ +\ 
\bold A \bold m_k\;,\end{displaymath} (27)
where we define $\bold m_k$ to be the data with missing values set to zero by $\bold m_k=\bold K\bold m$.

A second way to find missing data is with the set of goals  
 \begin{displaymath}
\begin{array}
{rrrrr}
\bold 0 & \approx & \bold K \bold m & ...
 ... \\ \bold 0 & \approx & \epsilon \bold A \bold m & &\end{array}\end{displaymath} (28)
and take the limit as the scalar $\epsilon \rightarrow 0$.At that limit, we should have the same result as equation (27).

There is an important philosophical difference between the first method and the second. The first method strictly honors the known data. The second method acknowledges that when data misfits the regularization theory, it might be the fault of the data so the data need not be strictly honored. Just what balance is proper falls to the numerical choice of $\epsilon$,a nontrivial topic.

A third way to find missing data is to precondition equation (28), namely, try the substitution $\bold m = \bold A^{-1} \bold p$. 
 \begin{displaymath}
\begin{array}
{rrrrr}
\bold 0 & \approx & \bold K \bold A^{-...
 ...bold m_k \\ \bold 0 & \approx & \epsilon \bold p & &\end{array}\end{displaymath} (29)
There is no simple way of knowing beforehand what is the best value of $\epsilon$.Practitioners like to see solutions for various values of $\epsilon$.Of course that can cost a lot of computational effort. Practical exploratory data analysis is more pragmatic. Without a simple clear theoretical basis, analysts generally begin from $\bold p=0$and abandon the fitting goal $\epsilon \bold I \bold p\approx 0$.Implicitly, they take $\epsilon=0$.Then they examine the solution as a function of iteration, imagining that the solution at larger iterations corresponds to smaller $\epsilon$.There is an eigenvector analysis indicating some kind of basis for this approach, but I believe there is no firm guidance.

Before we look at coding details for the three methods of filling the empty bins, we'll compare results of trying all three methods. For the roughening operator $\bold A$,we'll take the helix derivative $\bold H$.This is logically equivalent to roughening with the gradient $\bold \nabla$because the (negative) laplacian operator is $\bold\nabla'\bold\nabla = \bold H'\bold H $.