Under some circumstances, Bayesian estimation theory provides a computational prescription for selecting a maximum likelihood model which represents the information inherent in the data and computing its a posteriori variance. When the modeling operator is linear, the data statistics are known and Gaussian, and signal and noise are known to be statistically independent, noise variance is the only additional parameter required to set up a linear system for the maximum likelihood estimator. However if these statistical hypotheses are not satisfied or if the modeling operator is nonlinear, Bayesian theory does not give an explict prescription for selecting a representative model.
This paper explores an alternative selection principle, which I call extremal regularization. Extremal regularization does not require the extensive statistical assumptions of the Bayesian theory. It selects from the feasible set a model minimizing some auxiliary model property. Use of extremal regularization requires (1) a choice of auxiliary model property to extremize, (2) choices of norms to measure the data misfit and auxiliary model property, (3) knowledge of the data noise level, and (4) an algorithm for finding an extremal solution. Depending on application, (1), (2), and (3) involve greater or lesser degrees of arbitrariness. When the modeling operator is linear and the auxiliary property is quadratic in the model, extremal regularization amounts to minimization of a quadratic function subject to a quadratic constraint. In effect such an algorithm finds the penalty parameter or relative weight between data and model spaces as a function of the prescribed noised level. Note that the noise level has a much more obvious intuitive or physical meaning than the penalty parameter, though it is not always easy to determine from available data.
This notion is not new in geophysics. For example D. D. Jackson proposed similar ideas more than 20 years ago Jackson (1973, 1976). From Jackson (1979):
There are some who hold the recalcitrant point of view that the normal Backus-Gilbert resolving kernels tend to present results in too abstract a fashion, but that the use of priori data makes any error estimate rather arbitrary. For these, the only satisfactory evidence on which to base physical conclusions is a catalogue of models which fit the data well, are physically plausible, and contain among them models which have the maximum and minimum presence of some hypothetical feature. I must admit to having strong sympathies for this point of view.
Jackson extremized linear functions of the model subject to prescribed data misfit. These extrema represent the ends of model error bars.
This ``recalcitrant'' point of view is also natural when there is some nonlinear auxiliary quantity that should be minimized - or ideally even zeroed out - by virtue of fundamental model requirements of the model. This is the case for example in Claerbout's proposal for signal extraction via Jensen inequalities Claerbout (1992, 1998) and for the extended version of differential semblance optimization for velocities Gockenbach and Symes (1997). In other settings, for example the deconvolution problem used as an example in this report, extremizing an auxiliary quantity serves merely to pick out a ``simplest'' solution amongst many.
The Moré-Hebden algorithm finds the reelative weight between data and model spaces by applying Newton's method to the so called secular equation. The secular equation requires that the norm of the auxiliary model property be equal to its prescribed value. Since this norm will change as you change the relative weight between data misfit and auxiliary model property, the secular equation determines the weight. This idea is much used in numerical optimization, where quadratically constrained quadratic minimization goes under the name ``trust region problem'' Dennis and Schnabel (1983). The published versions of Moré-Hebden Hebden (1973); Moré (1977), also Björk (1997), pp. 205-6, have typically used LU decompositions to solve the linear systems required by Newton's method, so are limited to small and medium scale problems. This report describes a version appropriate for large scale problems, using conjugate gradient iteration. The presence of this ``inner'' iteration and the necessary lack of precision in the solution of the Newton system has interesitng consequences for convergence of the algorithm.
This report presents extremal regularization of linear inverse problems in the form of the quadratically constrained quadratic minimization problem solved by the Moré-Hebden algorithm. Examples based on ill-posed 1D deconvolution illustrate the extremal regularization concept and the behaviour of the algorithm.