Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION

# HUBER FUNCTION REGRESSION

I define the Huber function of each residual R as
 (2)

For small residuals R, the Huber function reduces to the usual L2 least squares penalty function, and for large R it reduces to the usual robust (noise insensitive) L1 penalty function. Notice the continuity at |R|= h where the Huber function switches from its L2 range to its L1 range. Likewise derivatives are continuous at the junctions |R|=h:

 (3)

The derivative of the Huber function is what we commonly call the clip function .

 (4)
In practice the clip function can be applied at a predetermined value h, or it can be applied at a percentile value of all the Ri.

Now let us set out to minimize a sum of Huber functions of all the components of the residual where the residual is perturbed by the addition of a small amount of gradient and previous step .The perturbed residual is where we are given ,,, and we seek to find and by setting to zero derivatives of by and .For simplicity we assume that and are small and that we do not need to worry about components jumping between the L2 and L1 range portions of the Huber function. Obviously residual component values will often jump between the two ranges, and because of that, we must iterate the steps I define next:

 (5) (6)
which reduces to two equations for and :
 (7) (8)
The nice thing about the plane search above is that it has an analytic solution given by the inverse of the matrix, and this solution is exact unless residual values have hopped between the L1 and L2 ranges, which means another iteration is appropriate. To do another iteration, we would update and then clip another time and repeat the plane search. I hoped the exact solution should be obtained in a finite number of iterations. Lack of numerical precision in my test case made this uncertain.

From the economical viewpoint, whether or not we would iterate for the values of and would depend on whether it was costly to compute the new gradient ,that is, whether and are costly to apply. If they are, we would want to make sure we got the most value from each we had, so we would iterate the plane search for .Otherwise, if it was cheap to compute the next gradient ,we would do so rather than making the best possible use of the existing gradient (by repeated plane search).

The economical viewpoint may be surpassed by the need to avoid trouble. Limited experiences so far show that instabilities can arise going from one to the next. We should be able to control them by iterating to convergence for each .Failing in that, I believe theory says we are assured stable convergence if we drop back from conjugate directions to steepest descent. All these extra precautions will require more than the straightforward coding below.

Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION
Stanford Exploration Project
11/12/1997