next up previous [pdf]

Next: An efficient algorithm for Up: APPENDIX Previous: APPENDIX

The limited-memory BFGS method

Nocedal (1980) derives a technique that partially solves the storage problem caused by the BFGS update. Instead of keeping all the $ \bold{s}$ and $ \bold{y}$ from the past iterations, we update the Hessian using the information from the $ l$ previous iterations, where $ l$ is given by the end-user. This implies that when the number of iterations is smaller than $ l$ , we have the usual BFGS update, and when it is larger than $ l$ , we have a limited-memory BFGS (L-BFGS) update.

I give the updating formulas of the Hessian as presented by Nocedal (1980). First, we define

$\displaystyle \bold{\rho}_i = 1/\bold{y}_i'\bold{s}_i$   , $\displaystyle \bold{v}_i=(I-\bold{\rho}_i\bold{y}_i\bold{s}_i')$    and $\displaystyle \bold{H}^{-1}=\bold{B}.

As described above, when $ k$ , the iteration number, obeys $ k+1 \leq l$ , where $ l$ is the storage limit, we have the BFGS update
$\displaystyle \bold{B}_{k+1}$ $\displaystyle =$ $\displaystyle \bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_0'\bold{B}_0
    $\displaystyle +\bold{v}_k'\cdots\bold{v}_1'\rho_0\bold{s}_0\bold{s}_0'\bold{v}_1\cdots\bold{v}_k$  
    $\displaystyle \vdots$ (4)
    $\displaystyle +\bold{v}_k'\rho_{k-1}\bold{s}_{k-1}\bold{s}_{k-1}'\bold{v}_k$  
    $\displaystyle +\rho_k\bold{s}_k\bold{s}_k'.$  

For $ k+1 > l$ we have the limited-memory update
$\displaystyle \bold{B}_{k+1}$ $\displaystyle =$ $\displaystyle \bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_{k-l+1}'
    $\displaystyle +\bold{v}_k'\cdots\bold{v}_{k-l+2}'\rho_{k-l+1}
    $\displaystyle \vdots$ (5)
    $\displaystyle +\bold{v}_k'\rho_{k-1}\bold{s}_{k-1}\bold{s}_{k-1}'\bold{v}_k$  
    $\displaystyle +\rho_k\bold{s}_k\bold{s}_k'.$  

These equations show how the update of the Hessian is calculated.

Usually the L-BFGS method is implemented with a line search for the step length $ \lambda_k$ to ensure a sufficient decrease of the misfit function. Convergence properties of the L-BFGS method are guaranteed if $ \lambda_k$ in equation (2) satisfies the Wolfe conditions (Kelley, 1999):

$\displaystyle f(\bold{x}_k+\lambda_k\bold{d}_k)$ $\displaystyle \leq$ $\displaystyle f(\bold{x}_k)
+ \mu \lambda_k\nabla f(\bold{x}_k)'\bold{d}_k,$ (6)
$\displaystyle \vert\nabla f(\bold{x}_k+\lambda_k\bold{d}_k)'\bold{d}_k\vert$ $\displaystyle \geq$ $\displaystyle \nu\vert\nabla
f(\bold{x}_k)'\bold{d}_k\vert.$ (7)

$ \nu$ and $ \mu$ are constants to be chosen a priori and $ \bold{d}_k=- {\bf B}_k \nabla f(\bold{m}_k)$ . For $ \nu$ and $ \mu$ we set $ \nu=0.9$ and $ \mu=10^{-4}$ as proposed by Liu and Nocedal (1989). Equation (6) is a sufficient decrease condition that all line search algorithms must satisfy. Equation (7) is a curvature condition. The line search algorithm has to be carefully designed since it absorbs most of the computing time. I programmed a line search based on the More and Thuente (1994) method. Because the line search is time consuming, the step length $ \lambda_k=1$ is always tested first. This procedure saves a lot of computing time and is also recommended by Liu and Nocedal (1989). I now give the algorithm used to minimize any objective function involving nonlinear problems.

next up previous [pdf]

Next: An efficient algorithm for Up: APPENDIX Previous: APPENDIX