next up previous print clean
Next: Effects of and niter Up: M. Clapp: The oddities Previous: Review of regularized inversion

The question of $\epsilon$ and niter

The intention of regularized inversion is to minimize the residuals of both the data fitting goal and the model styling goal. Generally, we think of accomplishing this by estimating the model ${\bf p}$ so that it best satisfies fitting goals (4). However, the estimated model itself is dependent on $\epsilon$ and niter. Our choice of $\epsilon$ and niter can have a significant effect on the model we obtain. To explore this idea, I will first examine $\epsilon$, then niter, showing the relationship between them.

Given the fitting goals (4), the direct solution for our model would be  
 \begin{displaymath}
{\bf p}\ = \ (({\bf L A^{-1}})' {\bf L A^{-1}}\ +\ {\bf \epsilon}^{2} {\bf I})^{-1}({\bf L A^{-1}}) ' {\bf d}\end{displaymath} (5)
so it is clear that the choice of $\epsilon$ will influence the model. Recall that equation (5) can not be used to solve our seismic inversion problems because the matrices involved are so large. Selecting an $\epsilon$ that is appropriate for a given problem and model is not a trivial task.

It is common to set $\epsilon=0$ for preconditioned problems. The inversion will still be affected by the regularization operator because it is present as ${\bf A^{-1}}$ in the data fitting goal. However, it is known that after a number of iterations, $\epsilon=0$ will cause the ${\bf A^{-1}}$ not to have an effect on the data fitting goal. Additionally, Crawley (2000) showed that a non-zero $\epsilon$ can improve the convergence rate of an iterative inversion problem. For these reasons, we would like to find an inexpensive way to choose a non-zero $\epsilon$.

Much work has been done on the idea of selecting $\epsilon$ based on L-curve analysis Calvetti et al. (1999, 2000). Essentially, an L-curve is created by plotting the residual of the model styling goal versus the residual of the data fitting goal using different values of $\epsilon$. This curve is shaped like an L. The vertex of the L is the point that will minimize both of the residuals. This point tells us what the ideal $\epsilon$ is - assuming that we iterate the problem to convergence. This brings us to the question of the number of iterations (niter).

Ideally, an iterative least-squares inversion should be allowed to iterate to convergence, that is, until the error residual is not getting any smaller. Paige and Saunders (1982) show that when we use a conjugate gradient scheme to minimize our problem, it is guaranteed to converge in no more iterations than there are parameters being inverted for. Preconditioning the problem can reduce this number, but even then our seismic inversions are too large to allow them to iterate to convergence. Therefore, we have to decide how many iterations are needed given our available computer resources. We are forced to decide when a model is ``good enough''. Naturally, since we are not iterating to convergence, our L-curve will be different so selecting an $\epsilon$ is now dependent on the choice of niter. In the next section, I will demonstrate the effects of $\epsilon$ and niter for a synthetic imaging problem.


next up previous print clean
Next: Effects of and niter Up: M. Clapp: The oddities Previous: Review of regularized inversion
Stanford Exploration Project
7/8/2003