Next: About this document ... Up: Preconditioning Previous: SCALING THE ADJOINT

A FORMAL DEFINITION FOR ADJOINTS

In mathematics, adjoints are defined a little differently than we have defined them here (as matrix transposes). The mathematician begins by telling us that we cannot simply form any dot product we want. We are not allowed to take the dot product of any two vectors in model space $\bold m_1 \cdot \bold m_2$ or data space $\bold d_1 \cdot \bold d_2$ .Instead, we must first transform them to a preferred coordinate system. Say $\tilde \bold m_1 = \bold M \bold m_1$ and $\tilde \bold d_1 = \bold D \bold d_1$ , etc for other vectors. We complain we do not know $\bold M$ and $\bold D$ .They reply that we do not really need to know them but we do need to have the inverses (aack!) of $\bold M'\bold M$ and $\bold D'\bold D$ .A pre-existing common notation is $\bold\sigma_m^{-2} = \bold M'\bold M$ and $\bold\sigma_d^{-2} = \bold D'\bold D$ .Now the mathematician buries the mysterious new positive-definite matrix inverses in the definition of dot product $<\bold m_1,\bold m_2\gt = \bold m'_1 \bold M'\bold M \bold m_2 = \bold m'_1 \bold \sigma_m^{-2} \bold m_2$ and likewise with $<\bold d_1,\bold d_2\gt$ .This suggests a total reorganization of our programs. Instead of computing $(\bold m'_1 \bold M') (\bold M \bold m_2)$ we could compute $\bold m'_1 (\bold\sigma_m^{-2} \bold m_2)$ .Indeed, this is the ``conventional'' approach. This definition of dot product would be buried in the solver code. The other thing that would change would be the search direction $\Delta \bold m$ .Instead of being the gradient as we have defined it $\Delta \bold m=\bold L'\bold r$ ,it would be $\Delta \bold m=\bold\sigma_m^{-2} \bold L'\bold\sigma_d^{-2}\bold r$ .A mathematician would define the adjoint of $\bold L$ to be $\bold\sigma_m^{-2} \bold L'\bold\sigma_d^{-2}$ .(Here $\bold L'$ remains matrix transpose.) You might notice this approach nicely incorporates both residual weighting and preconditioning while yet evading the issue of where we get the matrices $\sigma_m^2$ and $\sigma_d^2$ or how we invert them. Fortunately, upcoming chapter

suggests how, in image estimation problems, to obtain sensible estimates of the elusive operators $\bold M$ and $\bold D$ .Paranthetically, modeling calculations in physics and engineering often use similar mathematics in which the role of $\bold M'\bold M$ is not so mysterious. Kinetic energy is mass times velocity squared. Mass can play the role of $\bold M'\bold M$ .

So, should we continue to use $(\bold m'_1 \bold M') (\bold M \bold m_2)$ or should we take the conventional route and go with $\bold m'_1 (\bold\sigma_m^{-2} \bold m_2)$ ? One day while benchmarking a wide variety of computers I was shocked to see some widely differing numerical results. Now I know why. Consider adding 10⁷ identical positive floating point numbers, say 1.0's, in an arithmetic with precision of 10^-6. After you have added in the first 10⁶ numbers, the rest will all truncate in the roundoff and your sum will be wrong by a factor of ten. If the numbers were added in pairs, and then the pairs added, etc, there would be no difficulty. Precision is scary stuff!

It is my understanding and belief that there is nothing wrong with the approach of this book, in fact, it seems to have some definite advantages. While the conventional approach requires one to compute the adjoint correctly, we do not. The method of this book (which I believe is properly called conjugate directions) has a robustness that, I'm told, has been superior in some important geophysical applications. The conventional approach seems to get in trouble when transpose operators are computed with insufficient precision.

Next: About this document ... Up: Preconditioning Previous: SCALING THE ADJOINT

Stanford Exploration Project
4/27/2004