Next: The Iteratively Reweighted Least
Up: IRLS and the Huber
Previous: IRLS and the Huber
In the work reported here, I use
a hybrid l^{1}l^{2} error measure proposed by Huber Huber (1973):
I will call the Huber misfit function ,or Huber function for short (Figure 1). Note that the Huber function is smooth near
zero residual, and weights small residuals by the mean square. It is reasonable
to suppose that the Huber function, while maintaining robustness against large residuals,
is easier to minimize than l^{1}. The parameter , which controls the limit
between l^{1} and l^{2}, is called the Huber threshold.
huber
Figure 1 Error measure proposed by Huber Huber (1973).
The upper part above is the l^{1} norm, while the lower part is the l^{2} norm.

 
The problem is this: given some observed data , we want to find the
best model that explains the data via the operator .
This may be posed in terms of an inverse problem leading to the minimization of the function
 
(1) 
where E is an error measure function. As discussed above, I will use the Huber
function and try to minimize
 
(2) 
We seek to find the stationary point of the function f.
This solution point satisfies . This is a nonlinear
system of equations, and from Taylor expansion we have
if is sufficiently small. The Newton's method consists in finding
such that
and computing the next iterate by
 
(3) 
where is a steplength given by a Line Search algorithm. The general process of the program is then
 1.
 compute the gradient
 2.
 compute
 3.
 compute using a line search
 4.
 update the solution
 5.
 update the Hessian
 6.
 go to 1.
Because the Huber function is not twice continuously differentiable, the Hessian is not
computed directly but approximated using a limited Memory BFGS update Guitton (2000),
as proposed by Nocedal (1980) and Liu and Nocedal (1989).
I have implemented for the line search a More and Thuente More and Thuente (1994) algorithm,
which ensures sufficient decrease for the function f and obeys curvature conditions
(the socalled Wolfe conditions, Kelley (1999)), thus guaranteeing that a quasiNewton
update is possible.
The steplength is always tried first, saving significant
computation. These choices lead to good performances for both convergence
rate and computation cost. I call ``Huber solver'' the algorithm detailed above when
used with the Huber norm.
Next: The Iteratively Reweighted Least
Up: IRLS and the Huber
Previous: IRLS and the Huber
Stanford Exploration Project
4/27/2000