In an abstract setting,
let for be a set of *m* measurements. This data may be
organized into a data *m*-vector . Let be an *n*-vector containing the *n* parameters for the model/solution
space to be constructed.
Then, let be a (possibly nonlinear) function of the
model parameter vector such that in infinite precision forward modeling of the physical problem of interest.
If the data *d*_{i} and the functions *N*_{i} are known, then the inversion
problem is to find an approximate that satisfies the data
(*i. e.*, has as small a data misfit as possible) in some sense.

If life were simple, the operators *N*_{i} would be analytically invertible,
so one might think it could be possible to
write the solution to the inverse problem as .However, the result is certainly a strange looking formula:
While it is perfectly sensible that the domain of each function *N*_{i}
is a set of *n* model parameters, and that the range is
a single datum, the postulated inverse function with multiple model outputs from single datum
inputs is clearly not sensible. In most cases, there will simply
not be sufficient information
contained in a single datum to determine multiple model parameters.

For strictly linear problems, there might exist an appropriate
operator *X* acting on the full data *m*-vector such that
produces the model *n*-vector. However, even for
linear problems, the existence of such an operator is not necessarily
guaranteed since the sizes *m* and *n* of the two vector spaces will generally differ.
This difficult situation leads to the introduction of pseudoinverses in linear
problems or to the methods to be described now in nonlinear problems.

So, instead of using some hypothetical analytical approach, the desired solution to the inverse problem will make simultaneous use of all (or at least most) of the data while determining some sort of optimal fit to a chosen set of model parameters. This situation is common in data analysis and often leads to formulation of least-squares methods for this type of inversion problem.

An approach to this problem based on nonlinear least-squares inversion considers the nonlinear function

F(**s**) = _i=1^m [d_i-N_i(**s**)]^2,
which is simply the sum of the squares of the data residuals
*r*_{i} = *d*_{i} - *N*_{i}.
I formulate an ``evolution'' equation for the model vector
, where is an evolution parameter
- treated as a continuous generalization of an iteration number.
In particular, I must ultimately discretize this parameter in
my final numerical method, in which case the particular values
of will be exactly the iteration numbers.
But for purposes of explaining the method, it will prove useful to
treat initially as a continuous parameter.

The thrust of the nonlinear least-squares method is to produce a model that minimizes the least-squares error function . One way to guarantee [see, for example, Jeffrey and Rosner (1986) and Lu and Berryman (1991)] that a local minimum is achieved is to pose the evolution equation for in a way that guarantees the value of decreases monotonically as increases (or at each iteration step for the discretized problem). Taking the derivative of with respect to the evolution parameter, the chain rule gives

F(**s**) =
- 2 _j=1^n
_i=1^m [d_i-N_i(**s**)]N_is_j
s_j.
It is desired that the evolution equation for be chosen to guarantee
that (derivativeofF) is . It is easy to see that
(derivativeofF) will always be negative or zero if the evolution
equations for the model parameters *s*_{j} are chosen to be

s_j = ()
_i=1^m[d_i-N_i(**s**)]N_is_j,
where is a positive parameter to be determined.
Then,

F(**s**) =
- 2 ()_j=1^n
[_i=1^m [d_i-N_i(**s**)]N_is_j]^2,
so each term in the sum over *j* in (derivativeofF2) is a square
quantity and must be positive or vanish identically. The choice
(evolutioneqn) of
evolution then clearly guarantees the desired monotonically decreasing
behavior of the total squared error functional as .

To implement this procedure, a choice of discretization must be made together with a choice for . The obvious choice of discretization for the evolution equation (evolutioneqn) is

s_j(k+1) - s_j(k) **s**_j = ()
[d_i-N_i(**s**(k))]N_i(**s**)s_j|_**s**=
**s**(k),
where I have taken the finite step size for the evolution to be
. In the continuous evolution problem, the
infinitesimal changes in the evolution parameter guarantee similarly
infinitesimal changes in the model vector and therefore
this makes the
choice of largely arbitrary. In contrast, for the
discretized problem, the evolution of is finite at each step
and care must be taken not to violate the desired condition that the
least-squares functional should decrease at each step. Such
violations may occur if the step size is taken too large.

Reconsidering (Fofs), I find that, by keeping only those terms proportional to the first and second powers of the components of , I have

F(**s**(k+1)) = _i=1^m [d_i - N_i(**s**(k)+**s**)]^2
_i=1^m [d_i-N_i(**s**(k))]^2 - 2 _j=1^n
_i=1^m [d_i-N_i(**s**)]N_is_j
**s**_j
+ _i=1^m [_j=1^nN_is_j
**s**_j]^2.
After substituting (discreteevolution),
the parameter can now be chosen so that the right-hand side
of (discreteFofs) decreases as much as possible at each step of
the iteration scheme. The optimum choice is easily shown to be

(k+1) = _j=1^n
[_i=1^m [d_i-N_i(**s**(k))]N_i
s_j]^2
_j,j'[
_i'=1^m [d_i'-N_i'(**s**(k))]N_i's_j
_i=1^mN_is_j
N_is_j'
_i''=1^m [d_i''-N_i''(**s**(k))]N_i''
s_j'],
since this minimizes the right-hand side of (discreteFofs).

It will prove enlightening to compare the procedure just presented with the well-known method of conjugate gradients (Hestenes and Stiefel, 1952; Fomel, 1996) for a linear operator such that . Then, model updates are obtained using

**s**^(k+1) = **s**^(k) + ^(k+1)**u**^(k+1),
where, for the discrete iterative problem, I put the iteration numbers
in superscripts. The new vector is the (somehow) known
update direction in the model space and is a parameter
used to optimize the step size. The updated residual vector is again given by

**r**^(k+1) = **d** - **L****s**^(k+1)
= **r**^(k) - ^(k+1)**L****u**^(k+1).
The magnitude of the residual vector is easily shown to decrease most
at each step of the iteration sequence if the optimization parameter satisfies

^(k+1) = (**r**^(k),**L****u**^(k+1))
||**L****u**^(k+1)||^2.
Equation (cgparameter) is exactly analogous to the formula (lambda)
obtained in the nonlinear least-squares problem if the components of
the residual are given by

r_i^(k) = d_i - N_i(**s**^(k)),
the matrix elements of the linear operator are

(**L**)_ij = L_ij = N_is_j|_**
s**=
**s**^(k),
and the update direction vector satisfies

^(k+1)**u**^(k+1) = **s**,
where the components of the right-hand side of (update) are
given by (evolutioneqn) evaluated at .With this identification, it becomes
clear that the nonlinear least-squares method outlined above is one natural
generalization of the well-known conjugate gradients technique.

11/12/1997