(2) |

For small residuals *R*,
the Huber function reduces to the usual *L _{2}*
least squares penalty function,
and for large

(3) |

The derivative of the Huber function is what we commonly call the clip function .

(4) |

Now let us set out to minimize a sum
of Huber functions of all the components of the residual
where the residual is perturbed by the addition
of a small amount of gradient and previous step .The perturbed residual is
where we are given
,,, and
we seek to find and by setting to zero derivatives of by and .For simplicity we assume that and are small
and that we do not need to worry about components jumping between
the *L _{2}* and

(5) | ||

(6) |

(7) | ||

(8) |

From the economical viewpoint, whether or not we would iterate for the values of and would depend on whether it was costly to compute the new gradient ,that is, whether and are costly to apply. If they are, we would want to make sure we got the most value from each we had, so we would iterate the plane search for .Otherwise, if it was cheap to compute the next gradient ,we would do so rather than making the best possible use of the existing gradient (by repeated plane search).

The economical viewpoint may be surpassed by the need to avoid trouble. Limited experiences so far show that instabilities can arise going from one to the next. We should be able to control them by iterating to convergence for each .Failing in that, I believe theory says we are assured stable convergence if we drop back from conjugate directions to steepest descent. All these extra precautions will require more than the straightforward coding below.

11/12/1997