next up previous print clean
Next: An efficient algorithm for Up: Algorithm for minimizing the Previous: A quasi-Newton method for

The limited memory BFGS method

Nocedal (1980) derives a technique that partially solves the storage problem caused by the BFGS update. Instead of keeping all the $\bold{s}$ and $\bold{y}$ from the past iterations, we update the Hessian using the information from the l previous iterations, where l is given by the end-user. This implies that when the number of iterations is smaller than l, we have the usual BFGS update, and when it is larger than l, we have a limited memory BFGS (L-BFGS) update.

I give the updating formulas of the Hessian as presented by Nocedal (1980). First, we define

\begin{displaymath}
\bold{\rho}_i = 1/\bold{y}_i'\bold{s}_i \mbox{, } \bold{v}_i...
 ...o}_i\bold{y}_i\bold{s}_i') \mbox{ and } \bold{H}^{-1}=\bold{B}.\end{displaymath}

As described above, when k, the iteration number, obeys $k+1 \leq l$, where l is the storage limit, we have the BFGS update
\begin{eqnarray}
\bold{B}_{k+1}&=&\bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_0'\bo...
 ...\bold{v}_k \nonumber\\  & &+\rho_k\bold{s}_k\bold{s}_k'. \nonumber\end{eqnarray}
(75)
For k+1 > l we have the limited memory update
\begin{eqnarray}
\bold{B}_{k+1}&=&\bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_{k-l+...
 ...\bold{v}_k \nonumber\\  & &+\rho_k\bold{s}_k\bold{s}_k'. \nonumber\end{eqnarray}
(76)
These equations show how the update of the Hessian is calculated.

Usually the L-BFGS method is implemented with a line search for the step length $\lambda_k$ to ensure a sufficient decrease of the misfit function. Convergence properties of the L-BFGS method are guaranteed if $\lambda_k$ in equation ([*]) satisfies the Wolfe conditions Kelley (1999):
      \begin{eqnarray}
f(\bold{x}_k+\lambda_k\bold{d}_k) &\leq& f(\bold{x}_k)
 + \mu \...
 ...old{d}_k\vert&\geq& \nu\vert\nabla
 f(\bold{x}_k)'\bold{d}_k\vert.\end{eqnarray} (77)
(78)
$\nu$ and $\mu$ are constants to be chosen a priori and $\bold{d}_k=- {\bf B}_k \nabla f(\bold{m}_k)$. For $\nu$ and $\mu$ we set $\nu=0.9$ and $\mu=10^{-4}$ as proposed by Liu and Nocedal (1989). Equation ([*]) is a sufficient decrease condition that all line search algorithms must satisfy. Equation ([*]) is a curvature condition. The line search algorithm has to be carefully designed since it absorbs most of the computing time. I programmed a line search based on the More and Thuente (1994) method. Because the line search is time consuming, the step length $\lambda_k=1$ is always tested first. This procedure saves a lot of computing time and is also recommended by Liu and Nocedal (1989). I now give the algorithm used to minimize any objective function involving nonlinear problems.


next up previous print clean
Next: An efficient algorithm for Up: Algorithm for minimizing the Previous: A quasi-Newton method for
Stanford Exploration Project
5/5/2005