next up previous print clean
Next: The limited memory BFGS Up: Algorithm for minimizing the Previous: Algorithm for minimizing the

A quasi-Newton method for solving nonlinear problems

The method I present in this paper is suitable for smooth functions where local minima exist. It is not a method for global optimization where the global minimum is sought. We define $\bold{m}^*$ a local minimizer for $f({\bf m})$ and we assume that $f({\bf m})$ and $\bold{m}^*$ satisfy the ``standard requirements'':
1.
f is twice differentiable,
2.
$\nabla f(\bold{m}^*) = 0$,
3.
$\nabla^2 f(\bold{m}^*)$ is positive definite , i.e., $\bold{m'}\nabla^2 f(\bold{m}^*){\bf m}\gt$ for all $\bold{m}\in\Re^N$ (' denotes the adjoint).
where N is the dimension of the model vector ${\bf m}$ and $\Re^N$ the real space for the model vector ${\bf m}$.Any vector ${\bf m^*}$ that satisfies the standard requirements is a local minimizer of $f({\bf m})$.

Newton's method is an iterative process where the solution to the problem is updated as follows:
\begin{displaymath}
\bold{m}_{k+1}=\bold{m}_k - \lambda_k \bold{H}_k^{-1}\nabla f(\bold{m}_k),\end{displaymath} (72)
where $\bold{m}_{k+1}$ is the updated solution at iteration k+1, $\lambda_k$ the step length computed by a linesearch that ensures a sufficient decrease of $f({\bf m})$ and $\bold{H}_k=\nabla^2 f(\bold{m}_k)$the Hessian (or second derivative). In many circumstances the inverse of the Hessian can't be computed directly. It happens for example when the matrix ${\bf H}$ is too big or when operators are used rather than matrices. Fortunately we might be able to compute an approximation of the Hessian of $f({\bf m})$. This strategy gives birth to quasi-Newton methods where the way in which the Hessian is computed determines the method Kelley (1999).

A possible update of the Hessian is given by the BFGS technique Broyden (1969); Fletcher (1970); Goldfarb (1970); Shanno (1970). The BFGS update is given by  
 \begin{displaymath}
\bold{H}_{k+1}=\bold{H}_k+\frac{\bold{yy}'}{\bold{y}'\bold{s...
 ...\bold{s})
 (\bold{H}_k\bold{s})'}{\bold{s}'\bold{H}_k\bold{s}},\end{displaymath} (73)
where $\bold{s}=\bold{m}_{k+1}-\bold{m}_k$ and $\bold{y}=\nabla f(\bold{m}_{k+1})-\nabla f(\bold{m}_k)$.In practice, however, we rather write the previous equation in terms of the inverse matrices. We have then
\begin{displaymath}
\bold{H}_{k+1}^{-1}=\left(\bold{I}-\frac{\bold{sy}'}{\bold{y...
 ...ld{y}'\bold{s}}\right)+
 \frac{\bold{ss}'}{\bold{y}'\bold{s}}. \end{displaymath} (74)
In addition, we use the history of the iterations to compute the new Hessian rather than a full storage of the matrix $\bold{H}_k^{-1}$.This requires that a gradient step vector ${\bf y}$ and a solution step vector ${\bf s}$ are kept in memory after each iteration. Consequently this method might not been affordable with large data and model space. In the next section a modified version of the BFGS method that limits the storage needed to compute the update of the Hessian is proposed.


next up previous print clean
Next: The limited memory BFGS Up: Algorithm for minimizing the Previous: Algorithm for minimizing the
Stanford Exploration Project
5/5/2005