next up previous [pdf]

Next: The limited-memory BFGS method Up: Guitton: Quasi-Newton log-decon Previous: CONCLUSION


The L-BFGS method is suitable for smooth functions where local minima exist. It is not a method for global optimization where the global minimum is sought. L-BFGS is presented here in general terms using global definitions for the different variables: it does not follow the notations of the log-decon method.

We define $ \bold{m}^*$ a local minimizer for $ f({\bf m})$ and we assume that $ f({\bf m})$ and $ \bold{m}^*$ satisfy the ``standard requirements'':

  1. $ f$ is twice differentiable,
  2. $ \nabla f(\bold{m}^*) = 0$ ,
  3. $ \nabla^2 f(\bold{m}^*)$ is positive definite , i.e., $ \bold{m'}\nabla^2 f(\bold{m}^*){\bf m}>0$ for all $ \bold{m}\in\Re^N$ ($ '$ denotes the adjoint).
where $ N$ is the dimension of the model vector $ {\bf m}$ and $ \Re^N$ the real space for the model vector $ {\bf m}$ . Any vector $ {\bf m^*}$ that satisfies the standard requirements is a local minimizer of $ f({\bf m})$ .

Newton's method is an iterative process where the solution to the problem is updated as follows:

$\displaystyle \bold{m}_{k+1}=\bold{m}_k - \lambda_k \bold{H}_k^{-1}\nabla f(\bold{m}_k),$ (1)

where $ \bold{m}_{k+1}$ is the updated solution at iteration $ k+1$ , $ \lambda_k$ the step length computed by a line search that ensures a sufficient decrease of $ f({\bf m})$ and $ \bold{H}_k=\nabla^2 f(\bold{m}_k)$ the Hessian (or second derivative). In many circumstances the inverse of the Hessian can't be computed directly. It happens for example when the matrix $ {\bf H}$ is too big or when operators are used rather than matrices. Fortunately we might be able to compute an approximation of the Hessian of $ f({\bf m})$ . This strategy gives birth to quasi-Newton methods where the way in which the Hessian is computed determines the method (Kelley, 1999).

A possible update of the Hessian is given by the BFGS technique Fletcher (1970); Goldfarb (1970); Shanno (1970); Broyden (1969). The BFGS update is given by

$\displaystyle \bold{H}_{k+1}=\bold{H}_k+\frac{\bold{yy}'}{\bold{y}'\bold{s}} - \frac{(\bold{H}_k\bold{s}) (\bold{H}_k\bold{s})'}{\bold{s}'\bold{H}_k\bold{s}},$ (2)

where $ \bold{s}=\bold{m}_{k+1}-\bold{m}_k$ and $ \bold{y}=\nabla f(\bold{m}_{k+1})-\nabla f(\bold{m}_k)$ . In practice, however, we rather write the previous equation in terms of the inverse matrices. We have then

$\displaystyle \bold{H}_{k+1}^{-1}=\left(\bold{I}-\frac{\bold{sy}'}{\bold{y}'\bo...{\bold{ys}'}{\bold{y}'\bold{s}}\right)+ \frac{\bold{ss}'}{\bold{y}'\bold{s}}.$ (3)

In addition, we use the history of the iterations to compute the new Hessian rather than a full storage of the matrix $ \bold{H}_k^{-1}$ . This requires that a gradient step vector $ {\bf y}$ and a solution step vector $ {\bf s}$ are kept in memory after each iteration. Consequently this method might not been affordable with large data and model space. In the next section a modified version of the BFGS method that limits the storage needed to compute the update of the Hessian is proposed.

next up previous [pdf]

Next: The limited-memory BFGS method Up: Guitton: Quasi-Newton log-decon Previous: CONCLUSION