Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION

HUBER FUNCTION REGRESSION

I define the Huber function of each residual R as

$\begin{displaymath} H(R) \quad =\quad \left\{ \begin{array} {ll} {1\over 2}R^... ...ver 2}h^2 & \mbox{for $\vert R\vert\gt h$} \end{array} \right.\end{displaymath}$

(2)

For small residuals R, the Huber function reduces to the usual L₂ least squares penalty function, and for large R it reduces to the usual robust (noise insensitive) L₁ penalty function. Notice the continuity at |R|= h where the Huber function switches from its L₂ range to its L₁ range. Likewise derivatives are continuous at the junctions |R|=h:

$\begin{displaymath} {dH\over dR} \quad =\quad\left\{ \begin{array} {ll} R & \m... ...m sgn}(R) & \mbox{for $\vert R\vert\gt h$} \end{array} \right.\end{displaymath}$ (3)

The derivative of the Huber function is what we commonly call the clip function $dH/dR={\rm clip}(R)$ .

$\begin{displaymath} {dH\over dR} \quad =\quad {\rm clip}(R) \quad =\quad\left\{ ... ...$-h\le R\le h$} \\ h & \mbox{for $ h<R$} \end{array} \right.\end{displaymath}$ (4)

In practice the clip function can be applied at a predetermined value h, or it can be applied at a percentile value of all the R_i.

Now let us set out to minimize a sum of Huber functions of all the components of the residual $\sum_i H(R_i)$ where the residual $\bold R$ is perturbed by the addition of a small amount of gradient $\bold G$ and previous step $\bold S$ .The perturbed residual is $\bold R + \alpha \bold G + \beta \bold S$ where we are given $\bold R$ , $\bold G$ , $\bold S$ , and we seek to find $\alpha$ and $\beta$ by setting to zero derivatives of $\sum_i H(R_i)$ by $\alpha$ and $\beta$ .For simplicity we assume that $\alpha$ and $\beta$ are small and that we do not need to worry about components jumping between the L₂ and L₁ range portions of the Huber function. Obviously residual component values will often jump between the two ranges, and because of that, we must iterate the steps I define next:

$\begin{eqnarray} 0 \quad =\quad\sum_i {\partial H\over\partial R_i} {\partial R... ...a S_i) S_i \ +\ \sum_{\vert R_i\vert\gt h} h \ {\rm sgn}(R_i) S_i\end{eqnarray}$ (5)

(6)

which reduces to two equations for $\alpha$ and $\beta$ :

$\begin{eqnarray} \left\{ \sum_{\vert R_i\vert\le h} \left[ \begin{array} {cc} ... ...i \ {\rm clip}(R_i) \\ S_i \ {\rm clip}(R_i) \end{array}\right]\end{eqnarray}$ (7)

(8)

The nice thing about the plane search above is that it has an analytic solution given by the inverse of the $2\times 2$ matrix, and this solution is exact unless residual values have hopped between the L₁ and L₂ ranges, which means another iteration is appropriate. To do another iteration, we would update $\bold R \leftarrow \bold R + \alpha \bold G + \beta \bold S$ and then clip $\bold R$ another time and repeat the plane search. I hoped the exact solution should be obtained in a finite number of iterations. Lack of numerical precision in my test case made this uncertain.

From the economical viewpoint, whether or not we would iterate for the values of $\alpha$ and $\beta$ would depend on whether it was costly to compute the new gradient $\bold G=\bold F \bold F'{\rm clip}(\bold R)$ ,that is, whether $\bold F$ and $\bold F'$ are costly to apply. If they are, we would want to make sure we got the most value from each $\bold G$ we had, so we would iterate the plane search for $(\alpha,\beta)$ .Otherwise, if it was cheap to compute the next gradient $\bold G=\bold F \bold F'{\rm clip}(\bold R)$ ,we would do so rather than making the best possible use of the existing gradient (by repeated plane search).

The economical viewpoint may be surpassed by the need to avoid trouble. Limited experiences so far show that instabilities can arise going from one $\bold G$ to the next. We should be able to control them by iterating to convergence for each $\bold G$ .Failing in that, I believe theory says we are assured stable convergence if we drop back from conjugate directions to steepest descent. All these extra precautions will require more than the straightforward coding below.

Distractions

Next: Distractions Up: Claerbout: CG Huber regression Previous: INTRODUCTION

Stanford Exploration Project
11/12/1997

	$\begin{eqnarray} \left\{ \sum_{\vert R_i\vert\le h} \left[ \begin{array} {cc} ... ...i \ {\rm clip}(R_i) \\ S_i \ {\rm clip}(R_i) \end{array}\right]\end{eqnarray}$	(7)
		(8)