next up previous print clean
Next: Understanding CG magic and Up: THE WORLD OF CONJUGATE Previous: Coding nonlinear fitting problems

Standard methods

The conjugate-direction method is really a family of methods. Mathematically, where there are n unknowns, these algorithms all converge to the answer in n (or fewer) steps. The various methods differ in numerical accuracy, treatment of underdetermined systems, accuracy in treating ill-conditioned systems, space requirements, and numbers of dot products. Technically, the method of CD used in the cgstep module [*] is not the conjugate-gradient method itself, but is equivalent to it. This method is more properly called the conjugate-direction method with a memory of one step. I chose this method for its clarity and flexibility. If you would like a free introduction and summary of conjugate-gradient methods, I particularly recommend An Introduction to Conjugate Gradient Method Without Agonizing Pain by Jonathon Shewchuk, which you can downloadhttp://www.cs.cmu.edu/afs/cs/project/quake/public/papers/painless-conjugate-gradient.ps.

I suggest you skip over the remainder of this section and return after you have seen many examples and have developed some expertise, and have some technical problems.

The conjugate-gradient method was introduced by Hestenes and Stiefel in 1952. To read the standard literature and relate it to this book, you should first realize that when I write fitting goals like
\begin{eqnarray}
0 &\approx& \bold W( \bold F \bold m - \bold d ) \\  0 &\approx& \bold A \bold m,\end{eqnarray} (90)
(91)
they are equivalent to minimizing the quadratic form:  
 \begin{displaymath}
\bold m: \quad\quad
\min_{\bold m} Q(\bold m) \eq
( \bold m'...
 ... \bold F \bold m - \bold d)
\ +\ \bold m'\bold A'\bold A\bold m\end{displaymath} (92)
The optimization theory (OT) literature starts from a minimization of  
 \begin{displaymath}
\bold x: \quad\quad
 \min_{\bold x} Q(\bold x) \eq \bold x'\bold H \bold x - \bold b' \bold x\end{displaymath} (93)
To relate equation (92) to (93) we expand the parentheses in (92) and abandon the constant term $\bold d'\bold d$.Then gather the quadratic term in $\bold m$ and the linear term in $\bold m$.There are two terms linear in $\bold m$that are transposes of each other. They are scalars so they are equal. Thus, to invoke ``standard methods,'' you take your problem-formulation operators $\bold F$, $\bold W$, $\bold A$and create two modules that apply the operators
\begin{eqnarray}
\bold H &=& \bold F'\bold W'\bold W\bold F + \bold A'\bold A \\  \bold b' &=& 2(\bold F'\bold W'\bold W\bold d)'\end{eqnarray} (94)
(95)
The operators $\bold H$ and $\bold b'$ operate on model space. Standard procedures do not require their adjoints because $\bold H$ is its own adjoint and $\bold b'$reduces model space to a scalar. You can see that computing $\bold H$ and $\bold b'$ requires one temporary space the size of data space (whereas cgstep requires two).

When people have trouble with conjugate gradients or conjugate directions, I always refer them to the Paige and Saunders algorithm LSQR. Methods that form $\bold H$ explicitly or implicitly (including both the standard literature and the book3 method) square the condition number, that is, they are twice as susceptible to rounding error as is LSQR.


next up previous print clean
Next: Understanding CG magic and Up: THE WORLD OF CONJUGATE Previous: Coding nonlinear fitting problems
Stanford Exploration Project
4/27/2004