Under some circumstances, Bayesian estimation theory provides a
computational prescription for selecting a maximum likelihood model
which represents the information inherent in the data and computing
its *a posteriori* variance. When the modeling operator is linear,
the data statistics are known and Gaussian, and signal and noise are known
to be statistically independent, noise variance is the
only additional parameter required to set up a linear system for the
maximum likelihood estimator. However if these statistical hypotheses
are not satisfied or if the modeling operator is nonlinear, Bayesian
theory does not give an explict prescription for selecting a
representative model.

This paper explores an alternative selection principle, which I call
*extremal regularization*. Extremal regularization does not
require the extensive statistical assumptions of the Bayesian
theory. It selects from the feasible set a model minimizing some
auxiliary model property. Use of extremal regularization requires (1)
a choice of auxiliary model property to extremize, (2) choices of
norms to measure the data misfit and auxiliary model property, (3)
knowledge of the data noise level, and (4) an algorithm for finding an
extremal solution. Depending on application, (1), (2), and (3) involve
greater or lesser degrees of arbitrariness. When the modeling operator
is linear and the auxiliary property is quadratic in the model,
extremal regularization amounts to minimization of a quadratic
function subject to a quadratic constraint. In effect such an
algorithm finds the penalty parameter or relative weight between data
and model spaces as a function of the prescribed noised level. Note
that the noise level has a much more obvious intuitive or physical
meaning than the penalty parameter, though it is not always easy to
determine from available data.

This notion is not new in geophysics. For example D. D. Jackson proposed similar ideas more than 20 years ago Jackson (1973, 1976). From Jackson (1979):

There are some who hold the recalcitrant point of view that the normal Backus-Gilbert resolving kernels tend to present results in too abstract a fashion, but that the use ofprioridata makes any error estimate rather arbitrary. For these, the only satisfactory evidence on which to base physical conclusions is a catalogue of models which fit the data well, are physically plausible, and contain among them models which have the maximum and minimum presence of some hypothetical feature. I must admit to having strong sympathies for this point of view.

Jackson extremized linear functions of the model subject to prescribed data misfit. These extrema represent the ends of model error bars.

This ``recalcitrant'' point of view is also natural when there is some
nonlinear auxiliary quantity that should be minimized - or ideally
even zeroed out - by virtue of fundamental model requirements of the
model. This is the case for example in Claerbout's proposal for signal
extraction *via* Jensen inequalities Claerbout (1992, 1998)
and for the extended version of differential semblance optimization
for velocities Gockenbach and Symes (1997). In other settings, for example
the deconvolution problem used as an example in this report,
extremizing an auxiliary quantity serves merely to pick out a
``simplest'' solution amongst many.

The Moré-Hebden algorithm finds the reelative weight between data and model spaces by applying Newton's method to the so called secular equation. The secular equation requires that the norm of the auxiliary model property be equal to its prescribed value. Since this norm will change as you change the relative weight between data misfit and auxiliary model property, the secular equation determines the weight. This idea is much used in numerical optimization, where quadratically constrained quadratic minimization goes under the name ``trust region problem'' Dennis and Schnabel (1983). The published versions of Moré-Hebden Hebden (1973); Moré (1977), also Björk (1997), pp. 205-6, have typically used LU decompositions to solve the linear systems required by Newton's method, so are limited to small and medium scale problems. This report describes a version appropriate for large scale problems, using conjugate gradient iteration. The presence of this ``inner'' iteration and the necessary lack of precision in the solution of the Newton system has interesitng consequences for convergence of the algorithm.

This report presents extremal regularization of linear inverse problems in the form of the quadratically constrained quadratic minimization problem solved by the Moré-Hebden algorithm. Examples based on ill-posed 1D deconvolution illustrate the extremal regularization concept and the behaviour of the algorithm.

4/20/1999