previous up next print clean
Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION

HUBER FUNCTION REGRESSION

I define the Huber function of each residual R as
\begin{displaymath}
H(R) \quad =\quad
 \left\{
 \begin{array}
{ll}
 {1\over 2}R^...
 ...ver 2}h^2 & \mbox{for $\vert R\vert\gt h$}
 \end{array} \right.\end{displaymath} (2)

For small residuals R, the Huber function reduces to the usual L2 least squares penalty function, and for large R it reduces to the usual robust (noise insensitive) L1 penalty function. Notice the continuity at |R|= h where the Huber function switches from its L2 range to its L1 range. Likewise derivatives are continuous at the junctions |R|=h:

\begin{displaymath}
{dH\over dR}
\quad =\quad\left\{
 \begin{array}
{ll}
 R & \m...
 ...m sgn}(R) & \mbox{for $\vert R\vert\gt h$}
 \end{array} \right.\end{displaymath} (3)

The derivative of the Huber function is what we commonly call the clip function $dH/dR={\rm clip}(R)$.

\begin{displaymath}
{dH\over dR} \quad =\quad
{\rm clip}(R)
\quad =\quad\left\{
...
 ...$-h\le R\le h$} \\  h & \mbox{for $ h<R$} 
 \end{array} \right.\end{displaymath} (4)
In practice the clip function can be applied at a predetermined value h, or it can be applied at a percentile value of all the Ri.

Now let us set out to minimize a sum of Huber functions of all the components of the residual $\sum_i H(R_i)$where the residual $\bold R$ is perturbed by the addition of a small amount of gradient $\bold G$ and previous step $\bold S$.The perturbed residual is $\bold R + \alpha \bold G + \beta \bold S$where we are given $\bold R$,$\bold G$,$\bold S$, and we seek to find $\alpha$ and $\beta$by setting to zero derivatives of $\sum_i H(R_i)$ by $\alpha$ and $\beta$.For simplicity we assume that $\alpha$ and $\beta$ are small and that we do not need to worry about components jumping between the L2 and L1 range portions of the Huber function. Obviously residual component values will often jump between the two ranges, and because of that, we must iterate the steps I define next:

\begin{eqnarray}
0 \quad =\quad\sum_i {\partial H\over\partial R_i}
 {\partial R...
 ...a S_i) S_i
\ +\ 
\sum_{\vert R_i\vert\gt h} h \ {\rm sgn}(R_i) S_i\end{eqnarray} (5)
(6)
which reduces to two equations for $\alpha$ and $\beta$:
   \begin{eqnarray}
\left\{
\sum_{\vert R_i\vert\le h}
\left[
 \begin{array}
{cc}
 ...
 ...i \ {\rm clip}(R_i)
 \\  S_i \ {\rm clip}(R_i)
 \end{array}\right]\end{eqnarray} (7)
(8)
The nice thing about the plane search above is that it has an analytic solution given by the inverse of the $2\times 2$ matrix, and this solution is exact unless residual values have hopped between the L1 and L2 ranges, which means another iteration is appropriate. To do another iteration, we would update $\bold R \leftarrow \bold R + \alpha \bold G + \beta \bold S$and then clip $\bold R$ another time and repeat the plane search. I hoped the exact solution should be obtained in a finite number of iterations. Lack of numerical precision in my test case made this uncertain.

From the economical viewpoint, whether or not we would iterate for the values of $\alpha$ and $\beta$would depend on whether it was costly to compute the new gradient $\bold G=\bold F \bold F'{\rm clip}(\bold R)$,that is, whether $\bold F$ and $\bold F'$are costly to apply. If they are, we would want to make sure we got the most value from each $\bold G$ we had, so we would iterate the plane search for $(\alpha,\beta)$.Otherwise, if it was cheap to compute the next gradient $\bold G=\bold F \bold F'{\rm clip}(\bold R)$,we would do so rather than making the best possible use of the existing gradient (by repeated plane search).

The economical viewpoint may be surpassed by the need to avoid trouble. Limited experiences so far show that instabilities can arise going from one $\bold G$ to the next. We should be able to control them by iterating to convergence for each $\bold G$.Failing in that, I believe theory says we are assured stable convergence if we drop back from conjugate directions to steepest descent. All these extra precautions will require more than the straightforward coding below.



 
previous up next print clean
Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION
Stanford Exploration Project
11/12/1997