Comparing the convergence of optimization techniques can be quite difficult to do
in a fair manner. My steepest descent algorithm includes a termination criterion
based on the Armijo rule only, namely, a sufficient decrease condition of the objective
function (whereas L-BFGS use the Wolfe conditions).
In addition, both the steepest descent and L-BFGS algorithms have different line-searches,
which will affect convergence and computational performances.
The L-BFGS line-search is based on the More and Thuente method, which uses
bracketing and quadratic/cubic interpolation. The steepest-descent algorithm
uses a very simple scheme where the step length is divided by two until the sufficient decrease
condition is respected. Therefore, some of the computational differences come from
the line-search algorithm and stopping criteria in addition to the inherent convergence
properties of the two methods.