Next: Examples Up: Fomel: Regularization Previous: Model-space regularization

Data-space regularization

In this section, I consider an alternative formulation of the regularized least-square optimization.

We start again with the basic equation (1) and introduce a residual vector r, defining it by the relationship

$\begin{displaymath} \mbox{\unboldmath$\lambda$}r = d - L m\;,\end{displaymath}$ (10)

where $\mbox{\unboldmath$\lambda$}$ is a scaling parameter. Let us consider a compound model $\hat{m}$ , composed of the model vector m itself and the residual r. With respect to the compound model, equation (10) can be rewritten as

$\begin{displaymath} \left[\begin{array} {cc} L & \mbox{\unboldmath$\lambda$}I_d ... ...egin{array} {c} m \ r \end{array}\right] = G_d \hat{m} = d\;,\end{displaymath}$ (11)

where G_d is a row operator :

$\begin{displaymath} G_d = \left[\begin{array} {cc} L & \mbox{\unboldmath$\lambda$}I_d \end{array}\right]\;,\end{displaymath}$ (12)

and I_d represents the data-space identity operator.

System (11) is clearly under-determined with respect to the compound model $\hat{m}$ . If from all possible solutions of this system we seek the one with the minimal power $\hat{m}^T \hat{m}$ , the formal (ideal) result takes the well-known form

$\begin{displaymath} <\!\!\hat{m}\!\!\gt = \left[\begin{array} {c} <\!\!m\!\!\gt ... ...\unboldmath$\lambda$}^2 I_d\right)^{-1} d\end{array} \right]\;.\end{displaymath}$ (13)

To recall the derivation of formula (13), decompose the effective model vector $\hat{m}$ into two terms

$\begin{displaymath} \hat{m} = G_d^T d_0 + m_0\;,\end{displaymath}$ (14)

where d₀ and m₀ are to be determined. First, we choose m₀ to be an orthogonal supplement to G_d^T d₀. The orthogonality implies that the objective function $\hat{m}^T \hat{m} = d_0^T G_d G_d^T d_0 + m_0^T m_0$ is minimized only when m₀ = 0. To determine d₀, substitute (14) into equation (11) and solve the corresponding linear system. The result takes the form of equation (13).

Let us show that estimate (13) is exactly equivalent to estimate (9) from the ``trivial'' model-space regularization. Consider the operator

$\begin{displaymath} G = L^T L L^T + \mbox{\unboldmath$\lambda$}^2 L^T\;,\end{displaymath}$ (15)

which is a mapping from the data space to the model space. We can group the multiplicative factors in formula (25) in two different ways, as follows:

$\begin{displaymath} G = L^T (L L^T + \mbox{\unboldmath$\lambda$}^2 I_d) = (L^T L + \mbox{\unboldmath$\lambda$}^2 I_m) L^T\;.\end{displaymath}$ (16)

Regrouping the terms in (16), we arrive at the exact equality between the model estimates $<\!\!m\!\!\gt$ from equations (13) and (9):

$\begin{displaymath} L^T (L L^T + \mbox{\unboldmath$\lambda$}^2 I_d)^{-1} \equiv (L^T L + \mbox{\unboldmath$\lambda$}^2 I_m)^{-1} L^T\;.\end{displaymath}$ (17)

To obtain equation (17), multiply both sides of (16) by $(L^T L + \mbox{\unboldmath$\lambda$}^2 I_m)^{-1}$ from the left and by $(L L^T + \mbox{\unboldmath$\lambda$}^2 I_d)^{-1}$ from the right. For $\mbox{\unboldmath$\lambda$}\neq 0$ , both these matrices are indeed invertible.

Not only the optimization estimate, but also the form of the objective function, is exactly equivalent for both data-space and mode-space cases. The objective function of model-space least squares $\hat{r}^T \hat{r}$ is connected with the data-space objective function $\hat{m}^T \hat{m}$ by the simple proportionality

$\begin{displaymath} \hat {r}^T \hat{r} = \mbox{\unboldmath$\lambda$}^2 \hat {m}^T \hat{m}\;.\end{displaymath}$ (18)

This fact implies that the iterative methods of optimization - most notably, the conjugate-gradient method Hestenes and Steifel (1952) - should yield the same results for both formulations. Of course, this conclusion doesn't take into account the numerical effects of finite-precision computations.

To move to a more general (and interesting) case of ``non-trivial'' data-space regularization, we need to refer to the concept of model preconditioning Nichols (1994). A preconditioning operator P is used to introduce a new model x with the equality

$\begin{displaymath} m = P x\;.\end{displaymath}$ (19)

Substituting definition (19) into formula (11), we arrive at the following ``preconditioned'' form of the operator G_d:

$\begin{displaymath} \tilde{G}_d = \left[\begin{array} {cc} LP & \mbox{\unboldmath$\lambda$}I_d \end{array}\right]\;.\end{displaymath}$ (20)

The operator $\tilde{G}_d$ applies to the compound model vector

$\begin{displaymath} \hat{x} = \left[\begin{array} {c} x \ r \end{array}\right]\;.\end{displaymath}$ (21)

Substituting formula (20) into (13) leads to the following estimate for $\hat{x}$ :

$\begin{displaymath} <\!\!\hat{x}\!\!\gt = \left[\begin{array} {c} <\!\!x\!\!\gt ... ...\unboldmath$\lambda$}^2 I_d\right)^{-1} d\end{array} \right]\;.\end{displaymath}$ (22)

Applying formula (19), we obtain the corresponding estimate for the initial model m, as follows:

$\begin{displaymath} <\!\!m\!\!\gt = P <\!\!x\!\!\gt = P P^T L^T \left(L P P^T L^T + \mbox{\unboldmath$\lambda$}^2 I_d\right)^{-1} d \;.\end{displaymath}$ (23)

Now we can show that estimate (23) is exactly equivalent to formula (7) from the model-space regularization under the condition

$\begin{displaymath} \left(P P^T\right)^{-1} = D^T D\;.\end{displaymath}$ (24)

Condition (24) assumes that the operator $C \equiv P P^T$ is invertible. Consider the operator

$\begin{displaymath} G = L^T L C L^T + \mbox{\unboldmath$\lambda$}^2 L^T\;,\end{displaymath}$ (25)

which is another mapping from the data space to the model space. Grouping the multiplicative factors in two different ways, we can obtain the equality

$\begin{displaymath} G = L^T (L C L^T + \mbox{\unboldmath$\lambda$}^2 I_d) = (L^T L + \mbox{\unboldmath$\lambda$}^2 C^{-1}) C L^T\;,\end{displaymath}$ (26)

or, in another form,

$\begin{displaymath} C L^T (L C L^T + \mbox{\unboldmath$\lambda$}^2 I_d)^{-1} \equiv (L^T L + \mbox{\unboldmath$\lambda$}^2 C^{-1})^{-1} L^T\;.\end{displaymath}$ (27)

The left-hand side of equality (27) is exactly the projection operator from formula (23), and the right-hand side is the operator from formula (7).

Comparing formulas (23) and (7), it is interesting to note that we can turn a trivial regularization into a non-trivial one by simply replacing the exact adjoint operator L^T by the operator C L^T, which is a transformation from the data space to the model space, followed by enforcing model correlations with the operator C. This fact can be additionally confirmed by the equality

$\begin{displaymath} C L^T (L C L^T + \mbox{\unboldmath$\lambda$}^2 I_d)^{-1} \equiv (C L^T L + \mbox{\unboldmath$\lambda$}^2 I_m)^{-1} C L^T\;,\end{displaymath}$ (28)

which is derived analogously to formula (27). Iterative optimization methods, which don't require exact adjoint operators [e.g. the method of conjugate directions Fomel (1996)] could be employed for the task.

Though the final results of the model-space and data-space regularization are identical, the effect of preconditioning may alter the behavior of iterative gradient-based methods, such as the method of conjugate gradients. Though the objective functions are equal, their gradients with respect to the model parameters are different. Note, for example, that the first iteration of the model-space regularization yields L^T d as the model estimate regardless of the regularization operator, while the first iteration of the model-space regularization yields C L^T d, which is a ``simplified'' version of the model. Since iteration to the exact solution is never achieved in the large-scale problems, the results of iterative optimization may turn out quite differently. Harlan (1995) points out that the two components of the model-space regularization [equations (1) and (2)] conflict with each other: the first one enforces ``details'' in the model, while the second one tries to smooth them out. He describes the advantage of preconditioning:

The two objective functions produce different results when optimization is incomplete. A descent optimization of the original (model-space - S.F. ) objective function will begin with complex perturbations of the model and slowly converge toward an increasingly simple model at the global minimum. A descent optimization of the revised (data-space - S.F. ) objective function will begin with simple perturbations of the model and slowly converge toward an increasingly complex model at the global minimum. ...A more economical implementation can use fewer iterations. Insufficient iterations result in an insufficiently complex model, not in an insufficiently simplified model.

Examples in the next section illustrate these conclusions.

Next: Examples Up: Fomel: Regularization Previous: Model-space regularization

Stanford Exploration Project
9/11/2000