
                                Tue Dec 13 09:09:34 CST 1994

Jon Claerbout,

I do not get excited about the definition of residual
because residual does not seem to be a common word to label
the "disagreement between modeled and recorded data."
If you were proposing a change in the definition of "error," 
then I would object.  Most everyone understands "error"
to be the recorded data minus modeled data.

Dave may think that residual is a common synonym for error. 
If so, then I recommend using yet another word, taken
from a thesaurus, with the same initial (residuum, remanent, 
remnant) to avoid changing your previous text, or a different 
initial to avoid confusion with what you understand residual
usually to be.  

Dave says "I'm not keen on this idea. It makes it hard to 
compare all the algorithms that you write with the standard 
algorithms in textbooks."  (Note that I do not resort to
multiple levels of angle brackets: >>>>>.)

My optimization books usually do not mention errors or residuals.
Linear algorithms assume that the reader has already rewritten 
his objective function as a simple quadratic: x'Qx + b'x.
A quasi-Newton method assumes that you know how iteratively 
to approximate an arbitrary objective function as a quadratic.
This is what I do, and I recommend that you do the same.  
The gradient is much simpler, and you'll never get the sign wrong again.  

As an exercise, rewrite Tarantola's long-winded objective function as 
a simple quadratic: (d-Fm)'D(d-Fm) + m'M m.  So x=m, Q=?, b=?.  
Do it once, get the sign right, then write your algorithm in
terms of Q and b.  For extra credit, calculate Q and b
for the best fitting quadratic for a nonlinear f(m). Hint:
f(m0+x)= f(m0) + Fx.   All of your students could do this.

An optimization uses functions that apply b'x and Qx, but 
the programmer can still supply the more convenient operations 
Fx, Dx, and Mx.  Your optimization need only deal with
vectors in the model space.

For obvious reasons, you may prefer to use the letter H 
instead of Q. (Irony: Different books use different signs for b.  
Some prefer x'Qx - b'x.  I use the positive sign, but can't 
remember why.  It shouldn't matter.)  

Once your students have gotten used to this simplification of 
the objective function, they will find it much easier to 
read non-geophysical optimization books and apply new variations 
of algorithms.
