Resolution of the normal weighted equations

Next: IRLS ALGORITHM APPLIED TO L Up: L^p MINIMIZATION WITH THE Previous: IRLS algorithm

Resolution of the normal weighted equations

We still have to solve the normal equations (6), at each step of the general algorithm. I omit the indexes k and k+1 in what follows.

The matrix A^TWA is symmetric and positive. However, it does not have a particular structure (such as the Toeplitz structure of A^TA in deconvolution). The computations of A^TWA and of its inverse are very expensive, in time and in memory. That is why several authors (Yarlagadda et al., 1985; Gersztenkorn et al., 1986; Scales et al., 1988) prefer iterative methods, especially conjugate gradient (``CG'') methods, to direct inversion methods, like singular-value decomposition or orthogonalization by Givens rotations.

The advantages of conjugate-gradient methods are numerous. The matrix A^TWA does not need to be computed and stored. The algorithm ``only'' requires the computation of the gradient g = A^TW.e (e: residuals) and of WA.h (h: conjugate direction). The vectors e and h are updated at each iteration of the CG algorithm. W being diagonal, the product between W and a vector is trivial. Moreover, if the matrix A is sparse, its product by a vector can also be easily computed. If the problem is ill-conditioned, limiting the number of iterations is supposed to have an effect similar to that produced by adding a damping factor.

Finally, the convergence of the CG algorithm is controlled by three factors:

the condition number of the problem;
the choice of an initial estimate x⁰ of x; the problem is modified into the resolution of: $(y-Ax^0) - A.\delta x$ = 0.
the structure of the eigenspectrum of A^TWA itself: the more gathered the eigenvalues are, the quicker the algorithm will converge.

In theory, if the matrix A^TWA has N different non-zero eigenvalues, the CG algorithm should converge without round-off errors in N iterations.

I will use the conjugate-gradient method in the IRLS algorithm applied to deconvolution. However, as I will show, its efficiency will not be optimal, unless some precautions are taken. For what follows, I mainly consider the case of the L¹ deconvolution, with p=1.

Next: IRLS ALGORITHM APPLIED TO L Up: L^p MINIMIZATION WITH THE Previous: IRLS algorithm

Stanford Exploration Project
1/13/1998