Next: An efficient algorithm for Up: Algorithm for minimizing the Previous: A quasi-Newton method for

The limited memory BFGS method

Nocedal (1980) derives a technique that partially solves the storage problem caused by the BFGS update. Instead of keeping all the $\bold{s}$ and $\bold{y}$ from the past iterations, we update the Hessian using the information from the l previous iterations, where l is given by the end-user. This implies that when the number of iterations is smaller than l, we have the usual BFGS update, and when it is larger than l, we have a limited memory BFGS (L-BFGS) update.

I give the updating formulas of the Hessian as presented by Nocedal (1980). First, we define

$\begin{displaymath} \bold{\rho}_i = 1/\bold{y}_i'\bold{s}_i \mbox{, } \bold{v}_i... ...o}_i\bold{y}_i\bold{s}_i') \mbox{ and } \bold{H}^{-1}=\bold{B}.\end{displaymath}$

As described above, when k, the iteration number, obeys $k+1 \leq l$ , where l is the storage limit, we have the BFGS update

	$\begin{eqnarray} \bold{B}_{k+1}&=&\bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_0'\bo... ...\bold{v}_k \nonumber\\ & &+\rho_k\bold{s}_k\bold{s}_k'. \nonumber\end{eqnarray}$

		(75)

For k+1 > l we have the limited memory update

	$\begin{eqnarray} \bold{B}_{k+1}&=&\bold{v}_k'\bold{v}_{k-1}'\cdots\bold{v}_{k-l+... ...\bold{v}_k \nonumber\\ & &+\rho_k\bold{s}_k\bold{s}_k'. \nonumber\end{eqnarray}$

		(76)

These equations show how the update of the Hessian is calculated.

Usually the L-BFGS method is implemented with a line search for the step length $\lambda_k$ to ensure a sufficient decrease of the misfit function. Convergence properties of the L-BFGS method are guaranteed if $\lambda_k$ in equation () satisfies the Wolfe conditions Kelley (1999):

$\begin{eqnarray} f(\bold{x}_k+\lambda_k\bold{d}_k) &\leq& f(\bold{x}_k) + \mu \... ...old{d}_k\vert&\geq& \nu\vert\nabla f(\bold{x}_k)'\bold{d}_k\vert.\end{eqnarray}$ (77)

(78)

$\nu$ and $\mu$ are constants to be chosen a priori and $\bold{d}_k=- {\bf B}_k \nabla f(\bold{m}_k)$ . For $\nu$ and $\mu$ we set $\nu=0.9$ and $\mu=10^{-4}$ as proposed by Liu and Nocedal (1989). Equation () is a sufficient decrease condition that all line search algorithms must satisfy. Equation () is a curvature condition. The line search algorithm has to be carefully designed since it absorbs most of the computing time. I programmed a line search based on the More and Thuente (1994) method. Because the line search is time consuming, the step length $\lambda_k=1$ is always tested first. This procedure saves a lot of computing time and is also recommended by Liu and Nocedal (1989). I now give the algorithm used to minimize any objective function involving nonlinear problems.

Next: An efficient algorithm for Up: Algorithm for minimizing the Previous: A quasi-Newton method for

Stanford Exploration Project
5/5/2005

	$\begin{eqnarray} f(\bold{x}_k+\lambda_k\bold{d}_k) &\leq& f(\bold{x}_k) + \mu \... ...old{d}_k\vert&\geq& \nu\vert\nabla f(\bold{x}_k)'\bold{d}_k\vert.\end{eqnarray}$	(77)
		(78)