Standard methods

Next: Understanding CG magic and Up: THE WORLD OF CONJUGATE Previous: Coding nonlinear fitting problems

Standard methods

The conjugate-direction method is really a family of methods. Mathematically, where there are n unknowns, these algorithms all converge to the answer in n (or fewer) steps. The various methods differ in numerical accuracy, treatment of underdetermined systems, accuracy in treating ill-conditioned systems, space requirements, and numbers of dot products. Technically, the method of CD used in the cgstep module

is not the conjugate-gradient method itself, but is equivalent to it. This method is more properly called the conjugate-direction method with a memory of one step. I chose this method for its clarity and flexibility. If you would like a free introduction and summary of conjugate-gradient methods, I particularly recommend An Introduction to Conjugate Gradient Method Without Agonizing Pain by Jonathon Shewchuk, which you can downloadhttp://www.cs.cmu.edu/afs/cs/project/quake/public/papers/painless-conjugate-gradient.ps.

I suggest you skip over the remainder of this section and return after you have seen many examples and have developed some expertise, and have some technical problems.

The conjugate-gradient method was introduced by Hestenes and Stiefel in 1952. To read the standard literature and relate it to this book, you should first realize that when I write fitting goals like

$\begin{eqnarray} 0 &\approx& \bold W( \bold F \bold m - \bold d ) \\ 0 &\approx& \bold A \bold m,\end{eqnarray}$ (90)

(91)

they are equivalent to minimizing the quadratic form:

$\begin{displaymath} \bold m: \quad\quad \min_{\bold m} Q(\bold m) \eq ( \bold m'... ... \bold F \bold m - \bold d) \ +\ \bold m'\bold A'\bold A\bold m\end{displaymath}$ (92)

The optimization theory (OT) literature starts from a minimization of

$\begin{displaymath} \bold x: \quad\quad \min_{\bold x} Q(\bold x) \eq \bold x'\bold H \bold x - \bold b' \bold x\end{displaymath}$ (93)

To relate equation (92) to (93) we expand the parentheses in (92) and abandon the constant term $\bold d'\bold d$ .Then gather the quadratic term in $\bold m$ and the linear term in $\bold m$ .There are two terms linear in $\bold m$ that are transposes of each other. They are scalars so they are equal. Thus, to invoke ``standard methods,'' you take your problem-formulation operators $\bold F$ , $\bold W$ , $\bold A$ and create two modules that apply the operators

$\begin{eqnarray} \bold H &=& \bold F'\bold W'\bold W\bold F + \bold A'\bold A \\ \bold b' &=& 2(\bold F'\bold W'\bold W\bold d)'\end{eqnarray}$ (94)

(95)

The operators $\bold H$ and $\bold b'$ operate on model space. Standard procedures do not require their adjoints because $\bold H$ is its own adjoint and $\bold b'$ reduces model space to a scalar. You can see that computing $\bold H$ and $\bold b'$ requires one temporary space the size of data space (whereas cgstep requires two).

When people have trouble with conjugate gradients or conjugate directions, I always refer them to the Paige and Saunders algorithm LSQR. Methods that form $\bold H$ explicitly or implicitly (including both the standard literature and the book3 method) square the condition number, that is, they are twice as susceptible to rounding error as is LSQR.

Next: Understanding CG magic and Up: THE WORLD OF CONJUGATE Previous: Coding nonlinear fitting problems

Stanford Exploration Project
4/27/2004

	$\begin{eqnarray} 0 &\approx& \bold W( \bold F \bold m - \bold d ) \\ 0 &\approx& \bold A \bold m,\end{eqnarray}$	(90)
		(91)

	$\begin{eqnarray} \bold H &=& \bold F'\bold W'\bold W\bold F + \bold A'\bold A \\ \bold b' &=& 2(\bold F'\bold W'\bold W\bold d)'\end{eqnarray}$	(94)
		(95)