We want to solve the problem (Mst) in an iterative fashion, so we assume that the updates to the solution take the general form
s_n = s_n-1 + _np_n-1,
where is the preceding estimate of
,
is the new estimate of
,
is some direction to be specified in the model space, and
is an optimization parameter (or direction weight factor).
Defining the residual data error as
, we find the
general relation that
r_n = r_n-1 - _nMp_n-1.
One useful way to proceed is to choose the optimization parameter
so that the residual vector is decreased and preferably
minimized at each step of the iteration scheme. Using the standard
inner product notation
and considering
||r_n||^2 = ||r_n-1||^2 -
2_n(r_n-1,Mp_n-1) +
_n^2||Mp_n-1||^2,
we find easily that the optimum choice of using this criterion
is
_n = (r_n-1,Mp_n-1)
||Mp_n-1||^2.
This formula has the significance that, whenever the residual
has a component along the direction
,
is chosen to scale
so that this component exactly
cancels and therefore removes the contribution to
made by
. This result implies therefore that, if
,then with this choice of
we have
(r_n, Mp_n-1) = (M^Tr_n,p_n-1)= 0.
We used the adjoint property of the inner product in (porthoggrad)
to show that is orthogonal to the
gradient vector
, so-called
because it is the gradient obtained by taking the derivative
with respect to
of the squared residual
error functional associated with (Mst).
Thus, at each step of this iterative sequence a vector proportional to
some vector is added to the solution, while a vector
proportional to
is subtracted from the residual.
According to formulas (sqres) and (optimumalpha), the
squared norm of the residual decreases at each iteration as
||r_n||^2 = ||r_n-1||^2 - (r_n-1,Mp_n-1)^2 ||Mp_n-1||^2 The sequence of directions will be most efficient if the vectors used in decimating the residual are orthogonal, i.e., if
(Mp_n,Mp_j) = 0 for j=1,2,...,n-1. In this case, as follows by induction from formula (porthoggrad), the residual vector is also orthogonal to all those vectors:
(r_n,Mp_j) = 0 for j=1,2,...,n-1. Using again the adjoint relation for the inner product, we find that
(M^Tr_n,p_j) = (g_n,p_j) =0 for j=1,2,...,n-1. and
(p_n,M^TMp_j) = 0 for
j=1,2,...,n-1,
which is a statement of conjugacy for the vectors . Conjugacy is
just a generalization of orthogonality in which the vectors
are orthogonal relative the nonstandard inner product
- with
being a symmetric,
positive semidefinite matrix (operator) - instead of the standard
inner product given by
with
replaced by
the identity.
We conclude that conjugacy is a desirable property of the set of direction
vectors , so our next necessary step in order to obtain
a definite iterative process is to construct a convenient
sequence of vectors that have this property.
One set of model vectors that will be available in this iteration
sequence is the set of gradient vectors themselves, where
. We show next why this set
plays an important role in constructing the desired sequence.