The L-BFGS-B algorithm is affordable for very large problems. The memory requirement is roughly (12+2m)N where m is the number of BFGS updates kept in memory and N the size of the model space. In practice, m=5 is a typical choice. Per iteration, the number of multiplications range from 4mN+N when no constraints are applied to m2N when all variables are bounded. The program offers the freedom to have different bounds for different points of the model space. In addition, some points can be constrained while others are not.
There are three different stopping criteria for the L-BFGS-B algorithm.
First the program stops when the maximum number of iterations is reached.
Or, the program stops when the decrease of the objective
function becomes small enough.
Or, the program stops when the norm of the projected gradient (in a sense) is small enough.
Tests indicate that the L-BGFS-B algorithm ran in single precision with no constraints is not quite twice as slow as a conjugate gradient solver per iteration. This result is quite remarkable when considering that L-BFGS-B works for any type of non-linear (or linear) problem with line searches. In addition, the number of iterations needed to convergence is almost identical for both L-BFGS-B and the conjugate gradient solver. In the next section, I present a method to estimate sparse radon transforms.