next up previous print clean
Next: Null space and gradient Up: ITERATIVE METHODS Previous: Sign convention

Method of random directions and steepest descent

Let us minimize the sum of the squares of the components of the residual vector given by
   \begin{eqnarray}
{\rm residual}
&=&
{\rm 
\hbox{transform} \ \ \ \ \hbox{model s...
 ...t[
 \matrix { 
 \matrix { \cr \cr \bold d \cr \cr \cr}
 }
 \right]\end{eqnarray} (39)
(40)

A contour plot is based on an altitude function of space. The altitude is the dot product $\bold r \cdot \bold r$.By finding the lowest altitude, we are driving the residual vector $\bold r$ as close as we can to zero. If the residual vector $\bold r$ reaches zero, then we have solved the simultaneous equations $\bold d= \bold F \bold x$.In a two-dimensional world the vector $\bold x$ has two components, (x1 , x2 ). A contour is a curve of constant $\bold r \cdot \bold r$ in (x1 , x2 )-space. These contours have a statistical interpretation as contours of uncertainty in (x1 , x2 ), with measurement errors in $\bold d$.

Let us see how a random search-direction can be used to reduce the residual $0\approx \bold r= \bold F \bold x - \bold d$.Let $\Delta \bold x$ be an abstract vector with the same number of components as the solution $\bold x$,and let $\Delta \bold x$ contain arbitrary or random numbers. We add an unknown quantity $\alpha$of vector $\Delta \bold x$ to the vector $\bold x$,and thereby create $\bold x_{\rm new}$: 
 \begin{displaymath}
\bold x_{\rm new} \eq \bold x+\alpha \Delta \bold x\end{displaymath} (41)
This gives a new residual:
   \begin{eqnarray}
\bold r_{\rm new} &=&\bold F\ \bold x_{\rm new} -\bold d \\ \bo...
 ...Delta\bold r&=&\bold F\bold x-\bold d
 +\alpha\bold F\Delta\bold x\end{eqnarray} (42)
(43)
(44)
which defines $\Delta \bold r = \bold F \Delta \bold x$.

Next we adjust $\alpha$ to minimize the dot product: $ \bold r_{\rm new} \cdot \bold r_{\rm new} $ 
 \begin{displaymath}
(\bold r+\alpha\Delta \bold r)\cdot (\bold r+\alpha\Delta \b...
 ...lta \bold r) \ +\ 
\alpha^2 \Delta \bold r \cdot \Delta \bold r\end{displaymath} (45)
Set to zero its derivative with respect to $\alpha$ using the chain rule  
 \begin{displaymath}
0\eq 
(\bold r+\alpha\Delta \bold r)
\cdot
\Delta \bold r
\ ...
 ...ld r)
\eq
2
(\bold r+\alpha\Delta \bold r)
\cdot
\Delta \bold r\end{displaymath} (46)
which says that the new residual $\bold r_{\rm new} = \bold r + \alpha \Delta \bold r$ is perpendicular to the ``fitting function'' $\Delta \bold r$.Solving gives the required value of $\alpha$. 
 \begin{displaymath}
\alpha \eq - \ { (\bold r \cdot \Delta \bold r ) \over
( \Delta \bold r \cdot \Delta \bold r ) }\end{displaymath} (47)

A ``computation template'' for the method of random directions is

 


		 $\bold r \quad\longleftarrow\quad\bold F \bold x - \bold d$ 
		 iterate { 
		 		  $\Delta \bold x \quad\longleftarrow\quad{\rm random\ numbers}$ 
		 		  $\Delta \bold r\ \quad\longleftarrow\quad\bold F \ \Delta \bold x$ 
		 		 $\alpha \quad\longleftarrow\quad
 -( \bold r \cdot \Delta\bold r )/
 (\Delta \bold r \cdot \Delta\bold r )
 $		 		 $\bold x \quad\longleftarrow\quad\bold x + \alpha\Delta \bold x$ 
		 		 $\bold r\ \quad\longleftarrow\quad\bold r\ + \alpha\Delta \bold r$ 
		 		 } 
A nice thing about the method of random directions is that you do not need to know the adjoint operator $\bold F'$.

In practice, random directions are rarely used. It is more common to use the gradient direction than a random direction. Notice that a vector of the size of $\Delta \bold x$ is
\begin{displaymath}
\bold g \eq \bold F' \bold r\end{displaymath} (48)
Notice also that this vector can be found by taking the gradient of the size of the residuals:
\begin{displaymath}
{\partial \over \partial \bold x' } \ \bold r \cdot \bold r
...
 ...') \,
( \bold F \, \bold x \ -\ \bold d)
\eq
\bold F' \ \bold r\end{displaymath} (49)
Choosing $\Delta \bold x$ to be the gradient vector $\Delta\bold x = \bold g = \bold F' \bold r$is called ``the method of steepest descent.''

Starting from a model $\bold x = \bold m$ (which may be zero), below is a template of pseudocode for minimizing the residual $\bold 0\approx \bold r = \bold F \bold x - \bold d$by the steepest-descent method:  


		 $\bold r \quad\longleftarrow\quad\bold F \bold x - \bold d$ 
		 iterate { 
		 		  $\Delta \bold x \quad\longleftarrow\quad\bold F'\ \bold r$ 
		 		  $\Delta \bold r\ \quad\longleftarrow\quad\bold F \ \Delta \bold x$ 
		 		 $\alpha \quad\longleftarrow\quad
 -( \bold r \cdot \Delta\bold r )/
 (\Delta \bold r \cdot \Delta\bold r )
 $		 		 $\bold x \quad\longleftarrow\quad\bold x + \alpha\Delta \bold x$ 
		 		 $\bold r\ \quad\longleftarrow\quad\bold r\ + \alpha\Delta \bold r$ 
		 		 } 

next up previous print clean
Next: Null space and gradient Up: ITERATIVE METHODS Previous: Sign convention
Stanford Exploration Project
2/27/1998