We want to solve the problem (Mst) in an iterative fashion, so we assume that the updates to the solution take the general form

**s**_n = **s**_n-1 + _n**p**_n-1,
where is the preceding estimate of , is the new estimate of , is some direction to be specified in the model space, and
is an optimization parameter (or direction weight factor).
Defining the residual data error as
, we find the
general relation that

**r**_n = **r**_n-1 - _n**M****p**_n-1.
One useful way to proceed is to choose the optimization parameter
so that the residual vector is decreased and preferably
minimized at each step of the iteration scheme. Using the standard
inner product notation and considering

||**r**_n||^2 = ||**r**_n-1||^2 -
2_n(**r**_n-1,**M****p**_n-1) +
_n^2||**M****p**_n-1||^2,
we find easily that the optimum choice of using this criterion
is

_n = (**r**_n-1,**Mp**_n-1)
||**Mp**_n-1||^2.
This formula has the significance that, whenever the residual
has a component along the direction , is chosen to scale so that this component exactly
cancels and therefore removes the contribution to made by
. This result implies therefore that, if
,then with this choice of we have

(**r**_n, **Mp**_n-1) = (**M**^T**r**_n,**p**_n-1)= 0.
We used the adjoint property of the inner product in (porthoggrad)
to show that is orthogonal to the
gradient vector , so-called
because it is the gradient obtained by taking the derivative
with respect to of the squared residual
error functional associated with (Mst).

Thus, at each step of this iterative sequence a vector proportional to some vector is added to the solution, while a vector proportional to is subtracted from the residual. According to formulas (sqres) and (optimumalpha), the squared norm of the residual decreases at each iteration as

||**r**_n||^2 = ||**r**_n-1||^2 -
(**r**_n-1,**Mp**_n-1)^2
||**Mp**_n-1||^2
The sequence of directions will be most efficient if the vectors used
in decimating the residual are orthogonal, *i.e.*, if

(**Mp**_n,**Mp**_j) = 0 for j=1,2,...,n-1.
In this case, as follows by induction from formula
(porthoggrad), the residual vector is also orthogonal to all those
vectors:

(**r**_n,**Mp**_j) = 0 for j=1,2,...,n-1.
Using again the adjoint relation for the inner product, we find
that

(**M**^T**r**_n,**p**_j) = (**g**_n,**p**_j) =0
for j=1,2,...,n-1.
and

(**p**_n,**M**^T**Mp**_j) = 0 for
j=1,2,...,n-1,
which is a statement of conjugacy for the vectors . Conjugacy is
just a generalization of orthogonality in which the vectors
are orthogonal relative the nonstandard inner product
- with being a symmetric,
positive semidefinite matrix (operator) - instead of the standard
inner product given by with replaced by
the identity.

We conclude that conjugacy is a desirable property of the set of direction vectors , so our next necessary step in order to obtain a definite iterative process is to construct a convenient sequence of vectors that have this property. One set of model vectors that will be available in this iteration sequence is the set of gradient vectors themselves, where . We show next why this set plays an important role in constructing the desired sequence.

11/11/1997