The L-BFGS-B algorithm

Next: Application to the dip Up: Optimization Previous: Trust region methods

The L-BFGS-B algorithm

The L-BFGS-B algorithm is an extension of the L-BFGS algorithm to handle simple bounds on the model Zhu et al. (1997). The L-BFGS algorithm is a very efficient algorithm for solving large scale problems. L-BFGS-B borrows ideas from the trust region methods while keeping the L-BFGS update of the Hessian and line search algorithms. Methods based completely on the trust region techniques exist and are freely available. Among them, the program SBMIN from the LANCELOT package (written in Fortran 77) is very popular Conn et al. (1992). The original L-BFGS-B is freely available in Fortran 77 from the Northwestern University webpage.

The L-BFGS-B works as follows for one iteration:

1.

Find an approximation of the Cauchy point for

$\begin{displaymath} \Phi(\lambda)=m({\bf x}(\lambda))=m(P({\bf x_k - \lambda g_k})), \end{displaymath}$

(8)

with $m({\bf x})$ being the quadratic form in equation (5). Identify $A({\bf x})$ and $I({\bf x})$ .

2.

Minimize the quadratic form in equation (5) for the unconstrained variables. This step gives a search direction.

3.

Perform a line search along the new search direction to find the minimum of $f({\bf x})$ .

4.

Update the Hessian with the L-BFGS method Nocedal (1980) and check if convergence is obtained.

In a ``pure'' trust region method, step (3) is replaced by tests assessing the success of the update by measuring the difference between the quadratic form and the true objective function at the update. In addition, the radius $\Delta$ is also examined.

The L-BFGS-B algorithm is affordable for very large problems. The memory requirement is roughly (12+2m)N where m is the number of BFGS updates kept in memory and N the size of the model space. In practice, m=5 is a typical choice. Per iteration, the number of multiplications range from 4mN+N when no constraints are applied to m²N when all variables are bounded. The program offers the freedom to have different bounds for different points of the model space. In addition, some points can be constrained while others are not.

There are three different stopping criteria for the L-BFGS-B algorithm. First the program stops when the maximum number of iterations is reached. Or, the program stops when the decrease of the objective function becomes small enough. Or, the program stops when the norm of the projected gradient (in a $\ell^\infty$ sense) is small enough.

Now for the bells and whistles, tests indicate that the L-BGFS-B algorithm ran in single precision with no constraints is not quite twice as slow as a conjugate gradient solver per iteration. This result is quite remarkable when considering that L-BFGS-B works for any type of non-linear (or linear) problem with line searches. In addition, the number of iterations needed to convergence is almost identical for both L-BFGS-B and the conjugate gradient solver. In the next section, the L-BFGS-B algorithm is utilized to estimate local dips from seismic data.

Next: Application to the dip Up: Optimization Previous: Trust region methods

Stanford Exploration Project
10/23/2004