Next: Deconvolution Examples Up: Symes: Extremal regularization Previous: Quadratically Constrained Quadratic Minimization

Estimating the regularization parameter from the noise level

For arbitrary , denote by $x(\lambda)$ the solution of the normal equations

Set

$\begin{displaymath} \phi(\lambda) = \sqrt{(Ax(\lambda)-d)^T(Ax(\lambda)-d)}\end{displaymath}$

The secular equation is

$\begin{displaymath} \phi(\lambda) = \sigma\sqrt{d^Td}\end{displaymath}$

and its solution, if it has one, gives the correct value of the Lagrange multiplier $\lambda$ .

The Moré-Hebden algorithm takes its cue from the simplest possible case: x and d are one-dimensional, and A and R are scalars. In that very special case,

$\begin{displaymath} x(\lambda) = \frac{\lambda Ad}{\lambda A^2 + R^2}\end{displaymath}$

hence

i.e. the reciprocal of $\phi$ is a linear function of $\lambda$ .This suggests that Newtons's method is more likely to converge quickly when applied to

$\begin{displaymath} \psi(\lambda) \equiv \frac{1}{\phi(\lambda)} = \frac{1}{\sigma\sqrt{d^Td}}\end{displaymath}$

and that is exactly what the Moré-Hebden algorithm does.

The iteration proceeds as follows:

initialize $\lambda$ somehow
until convergence do: replace $\lambda$ by
$\begin{displaymath} \lambda - \frac{\psi(\lambda) - \frac{1}{\sigma\sqrt{d^Td}}}{\psi'(\lambda)}\end{displaymath}$

in which $\psi'$ stands for the deriviative of $\psi$ , which you compute like so:

$\begin{displaymath} \psi'(\lambda) = -\frac{\phi'(\lambda)}{\phi^2(\lambda)}\end{displaymath}$

Now

From the normal equations,

Putting all of this together, one obtains the following algorithm for updating $\lambda$ :

solve the normal equations for x, compute the residual $\phi$
compute the normal residual g = A^T(Ax-d)
solve the auxiliary system $(\lambda A^TA+R^TR)s = g$ for s
compute $\phi' = g^Ts/\phi$
replace $\lambda$ by

The first and third steps involve solution of linear systems, which in geophysical applications may be very large. Therefore, in contrast to conventional implementations of this algorithm, I use conjugate gradient iteration Björk (1997) to compute the solutions of these systems. As one might expect, the error reduction attained by these inner iterations affects the overall convergence rate of the algorithm.

A final detail: since $\lambda = \epsilon^{-2}$ must remain positive, I have replaced any large decrease implied by the above formula by a bisection strategy. Since $\phi' < 0$ , as soon as $\lambda$ is too small (which forces the weight onto the regularization term and increases the residual), the algorithm produces regular increases in $\lambda$ and converges very rapidly, usually in one or two steps, so long as the normal equations are solved successfully. This is not always the case, but failure to converge rapidly appears to signal large data components associated with very small eigenvalues and is a sure sign that the noise level estimate $\sigma$ has been chosen too small.

Next: Deconvolution Examples Up: Symes: Extremal regularization Previous: Quadratically Constrained Quadratic Minimization

Stanford Exploration Project
4/20/1999