Next: The limited memory BFGS Up: Algorithm for minimizing the Previous: Algorithm for minimizing the

A quasi-Newton method for solving nonlinear problems

The method I present in this paper is suitable for smooth functions where local minima exist. It is not a method for global optimization where the global minimum is sought. We define $\bold{m}^*$ a local minimizer for $f({\bf m})$ and we assume that $f({\bf m})$ and $\bold{m}^*$ satisfy the ``standard requirements'':

1.: f is twice differentiable,
2.: $\nabla f(\bold{m}^*) = 0$ ,
3.: $\nabla^2 f(\bold{m}^*)$ is positive definite , i.e., $\bold{m'}\nabla^2 f(\bold{m}^*){\bf m}\gt$ for all $\bold{m}\in\Re^N$ (' denotes the adjoint).

where N is the dimension of the model vector ${\bf m}$ and $\Re^N$ the real space for the model vector ${\bf m}$ .Any vector ${\bf m^*}$ that satisfies the standard requirements is a local minimizer of $f({\bf m})$ .

Newton's method is an iterative process where the solution to the problem is updated as follows:

$\begin{displaymath} \bold{m}_{k+1}=\bold{m}_k - \lambda_k \bold{H}_k^{-1}\nabla f(\bold{m}_k),\end{displaymath}$ (72)

where $\bold{m}_{k+1}$ is the updated solution at iteration k+1, $\lambda_k$ the step length computed by a linesearch that ensures a sufficient decrease of $f({\bf m})$ and $\bold{H}_k=\nabla^2 f(\bold{m}_k)$ the Hessian (or second derivative). In many circumstances the inverse of the Hessian can't be computed directly. It happens for example when the matrix ${\bf H}$ is too big or when operators are used rather than matrices. Fortunately we might be able to compute an approximation of the Hessian of $f({\bf m})$ . This strategy gives birth to quasi-Newton methods where the way in which the Hessian is computed determines the method Kelley (1999).

A possible update of the Hessian is given by the BFGS technique Broyden (1969); Fletcher (1970); Goldfarb (1970); Shanno (1970). The BFGS update is given by

$\begin{displaymath} \bold{H}_{k+1}=\bold{H}_k+\frac{\bold{yy}'}{\bold{y}'\bold{s... ...\bold{s}) (\bold{H}_k\bold{s})'}{\bold{s}'\bold{H}_k\bold{s}},\end{displaymath}$ (73)

where $\bold{s}=\bold{m}_{k+1}-\bold{m}_k$ and $\bold{y}=\nabla f(\bold{m}_{k+1})-\nabla f(\bold{m}_k)$ .In practice, however, we rather write the previous equation in terms of the inverse matrices. We have then

$\begin{displaymath} \bold{H}_{k+1}^{-1}=\left(\bold{I}-\frac{\bold{sy}'}{\bold{y... ...ld{y}'\bold{s}}\right)+ \frac{\bold{ss}'}{\bold{y}'\bold{s}}. \end{displaymath}$ (74)

In addition, we use the history of the iterations to compute the new Hessian rather than a full storage of the matrix $\bold{H}_k^{-1}$ .This requires that a gradient step vector ${\bf y}$ and a solution step vector ${\bf s}$ are kept in memory after each iteration. Consequently this method might not been affordable with large data and model space. In the next section a modified version of the BFGS method that limits the storage needed to compute the update of the Hessian is proposed.

Next: The limited memory BFGS Up: Algorithm for minimizing the Previous: Algorithm for minimizing the

Stanford Exploration Project
5/5/2005