The limited-memory BFGS method

Next: An efficient algorithm for Up: APPENDIX Previous: APPENDIX

The limited-memory BFGS method

Nocedal (1980) derives a technique that partially solves the storage problem caused by the BFGS update. Instead of keeping all the $\bold{s}$ and $\bold{y}$ from the past iterations, we update the Hessian using the information from the $ l$

previous iterations, where $ l$

is given by the end-user. This implies that when the number of iterations is smaller than $ l$

, we have the usual BFGS update, and when it is larger than $ l$

, we have a limited-memory BFGS (L-BFGS) update.

I give the updating formulas of the Hessian as presented by Nocedal (1980). First, we define

$\displaystyle \bold{\rho}_i = 1/\bold{y}_i'\bold{s}_i$ , $\displaystyle \bold{v}_i=(I-\bold{\rho}_i\bold{y}_i\bold{s}_i')$ and $\displaystyle \bold{H}^{-1}=\bold{B}.$

As described above, when $ k$

, the iteration number, obeys $k+1 \leq l$ , where $ l$

is the storage limit, we have the BFGS update

$\displaystyle \bold{B}_{k+1}$	$\displaystyle =$	$\displaystyle \bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_0'\bold{B}_0 \bold{v}_0\cdots\bold{v}_{k-1}\bold{v}_k$
		$\displaystyle +\bold{v}_k'\cdots\bold{v}_1'\rho_0\bold{s}_0\bold{s}_0'\bold{v}_1\cdots\bold{v}_k$
		$\displaystyle \vdots$	(4)
		$\displaystyle +\bold{v}_k'\rho_{k-1}\bold{s}_{k-1}\bold{s}_{k-1}'\bold{v}_k$
		$\displaystyle +\rho_k\bold{s}_k\bold{s}_k'.$

For

we have the limited-memory update

$\displaystyle \bold{B}_{k+1}$	$\displaystyle =$	$\displaystyle \bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_{k-l+1}' \bold{B}_0\bold{v}_{k-l+1}\cdots\bold{v}_{k-1}\bold{v}_k$
		$\displaystyle +\bold{v}_k'\cdots\bold{v}_{k-l+2}'\rho_{k-l+1} \bold{s}_{k-l+1}\bold{s}_{k-l+1}'\bold{v}_{k-l+2}\cdots\bold{v}_k$
		$\displaystyle \vdots$	(5)
		$\displaystyle +\bold{v}_k'\rho_{k-1}\bold{s}_{k-1}\bold{s}_{k-1}'\bold{v}_k$
		$\displaystyle +\rho_k\bold{s}_k\bold{s}_k'.$

These equations show how the update of the Hessian is calculated.

Usually the L-BFGS method is implemented with a line search for the step length $\lambda_k$ to ensure a sufficient decrease of the misfit function. Convergence properties of the L-BFGS method are guaranteed if $\lambda_k$ in equation (2) satisfies the Wolfe conditions (Kelley, 1999):

$\displaystyle f(\bold{x}_k+\lambda_k\bold{d}_k)$	$\displaystyle \leq$	$\displaystyle f(\bold{x}_k) + \mu \lambda_k\nabla f(\bold{x}_k)'\bold{d}_k,$	(6)
$\displaystyle \vert\nabla f(\bold{x}_k+\lambda_k\bold{d}_k)'\bold{d}_k\vert$	$\displaystyle \geq$	$\displaystyle \nu\vert\nabla f(\bold{x}_k)'\bold{d}_k\vert.$	(7)

$\nu$ and $\mu$ are constants to be chosen a priori and $\bold{d}_k=- {\bf B}_k \nabla f(\bold{m}_k)$ . For $\nu$ and $\mu$ we set $\nu=0.9$ and $\mu=10^{-4}$ as proposed by Liu and Nocedal (1989). Equation (6) is a sufficient decrease condition that all line search algorithms must satisfy. Equation (7) is a curvature condition. The line search algorithm has to be carefully designed since it absorbs most of the computing time. I programmed a line search based on the More and Thuente (1994) method. Because the line search is time consuming, the step length $\lambda_k=1$ is always tested first. This procedure saves a lot of computing time and is also recommended by Liu and Nocedal (1989). I now give the algorithm used to minimize any objective function involving nonlinear problems.

Next: An efficient algorithm for Up: APPENDIX Previous: APPENDIX

2012-10-29