Next: AN EXAMPLE Up: Karrenbach: automatic differentiation Previous: Introduction

The PROCESS OF AUTOMATIC DIFFERENTIATION

Automatic differentiation is the process of producing the values of a function's derivative from some representation of the function itself. The main difference between automatic differentiation and symbolic differentiation (like Mathematica) is that automatic differentiation does not generate a symbolic representation of the function. Automatic differentiation is primarily concerned with the differentiation of algorithms. That algorithm can be written in a general programming language. This flexibility is particularly useful in exploration geophysics, because though we can often write down the algorithm, computing the Jacobian or even a good approximation of the Jacobian is very tedious, if not impossible analytically.

For example let's take a vector function of the form $f : R^m \rightarrow R^n$ . If X = <x₁,x₂,...,x_m> is a vector of length m and Y = <y₁,y₂,...,y_n> a vector of length n, depending on what form f takes Y=f(X) (scalar or vector), we have

$\begin{displaymath} {\rm Gradient}\qquad G_i = {{\partial f}\over{\partial x_i}}\end{displaymath}$ (1)

or

$\begin{displaymath} {\rm Hessian}\qquad H_{ij} = {{\partial f}\over{\partial x_i x_j}}\end{displaymath}$ (2)

or

$\begin{displaymath} {\rm Jacobian}\qquad J_{ij} = {{\partial y_i}\over{\partial x_j}}\end{displaymath}$ (3)

In our application f can take many forms; common ones are global semblance and other objectivity functions used during a minimization process. We generally know very well how to code the objective function, linear or non linear. For example, it is clear how to write the algorithm for the prestack depth migration. However, it is not so easy to calculate the gradient with respect to the velocity model. In many cases the only choice is to approximate the gradient or Jacobian, analytically or numerically. In iterative problems, the convergence directly depends on the Jacobian.

Automatic differentiation applied to algorithms and functions has the advantage of yielding a derivative or Jacobian machine-precision. This method avoids approximations by using preprocessors to high-level languages like Fortran to parse the code. The code is logically split up and data dependency trees are built. One simply has to tell the preprocessor which variable is dependent on which other variable. Usually this is done at the beginning of the code in form of a comment. The automatic differentiator now has two basic choices: it can comb through the data dependency tree from top to bottom (forward propagation), or from bottom to top (reverse propagation). The choice between forward and reverse propagation is efficiently made according to the type of function to be differentiated. The automatic differentiator knows the basic rules of differentiation (the chain rule, the product rule, and so on). It creates additional code, which saves temporary variables that are necessary in order to propagate derivative values through the dependency tree; the details are explained in Rall (1981). This temporary storage may increase the overall memory needs of the subroutine, but, depending on your problem, this may be preferable to running your program several times and getting only an approximate value.

There are now several automatic differentiation preprocessors. I have choosen to look at one, which is called JAKEF 1990, which is freely available from NETLIB. This code has the advantages that it operates on Fortran subroutines and is easy to use. It has however, some severe limitations for geophysical algorithms. In the future I'd like to evaluate a preprocessor, ADIFOR, written by Griewank 1991. This new preprocessor is not yet released, but will soon be available. A third preprocessor is available from Kubota 1992; it is called Padre2.

Next: AN EXAMPLE Up: Karrenbach: automatic differentiation Previous: Introduction

Stanford Exploration Project
11/18/1997