We want to solve the problem (Mst) in an iterative fashion, so we assume that the updates to the solution take the general form
s_n = s_n-1 + _np_n-1, where is the preceding estimate of , is the new estimate of , is some direction to be specified in the model space, and is an optimization parameter (or direction weight factor). Defining the residual data error as , we find the general relation that
r_n = r_n-1 - _nMp_n-1. One useful way to proceed is to choose the optimization parameter so that the residual vector is decreased and preferably minimized at each step of the iteration scheme. Using the standard inner product notation and considering
||r_n||^2 = ||r_n-1||^2 - 2_n(r_n-1,Mp_n-1) + _n^2||Mp_n-1||^2, we find easily that the optimum choice of using this criterion is
_n = (r_n-1,Mp_n-1) ||Mp_n-1||^2. This formula has the significance that, whenever the residual has a component along the direction , is chosen to scale so that this component exactly cancels and therefore removes the contribution to made by . This result implies therefore that, if ,then with this choice of we have
(r_n, Mp_n-1) = (M^Tr_n,p_n-1)= 0. We used the adjoint property of the inner product in (porthoggrad) to show that is orthogonal to the gradient vector , so-called because it is the gradient obtained by taking the derivative with respect to of the squared residual error functional associated with (Mst).
Thus, at each step of this iterative sequence a vector proportional to some vector is added to the solution, while a vector proportional to is subtracted from the residual. According to formulas (sqres) and (optimumalpha), the squared norm of the residual decreases at each iteration as
||r_n||^2 = ||r_n-1||^2 - (r_n-1,Mp_n-1)^2 ||Mp_n-1||^2 The sequence of directions will be most efficient if the vectors used in decimating the residual are orthogonal, i.e., if
(Mp_n,Mp_j) = 0 for j=1,2,...,n-1. In this case, as follows by induction from formula (porthoggrad), the residual vector is also orthogonal to all those vectors:
(r_n,Mp_j) = 0 for j=1,2,...,n-1. Using again the adjoint relation for the inner product, we find that
(M^Tr_n,p_j) = (g_n,p_j) =0 for j=1,2,...,n-1. and
(p_n,M^TMp_j) = 0 for j=1,2,...,n-1, which is a statement of conjugacy for the vectors . Conjugacy is just a generalization of orthogonality in which the vectors are orthogonal relative the nonstandard inner product - with being a symmetric, positive semidefinite matrix (operator) - instead of the standard inner product given by with replaced by the identity.
We conclude that conjugacy is a desirable property of the set of direction vectors , so our next necessary step in order to obtain a definite iterative process is to construct a convenient sequence of vectors that have this property. One set of model vectors that will be available in this iteration sequence is the set of gradient vectors themselves, where . We show next why this set plays an important role in constructing the desired sequence.