previous up next print clean
Next: HUBER FUNCTION REGRESSION Up: Claerbout: CG Huber regression Previous: Claerbout: CG Huber regression

INTRODUCTION

The noise variance of data of all kinds is often nonstationary and difficult to characterize. This observation pushes us away from least squares with its requirement that we provide an estimate of the inverse noise variance/covariance, toward robust statistical procedures because they are less sensitive to imperfect knowledge of noise variance.

I seek to provide an electronic textbook with dozens of readily reproducible examples of application of geophysical inverse theory. On my sabbatical leave during the academic year '93-'94, I gathered four wide ranging geophysical data sets, depth sounding (Galilee), radar (Vesuvious), satellite altimetry (Madagasgar), and Seabeam (ocean ridge), and began preparing TDF, ``Three dimensional filtering: Environmental soundings image enhancement". The first time I taught this material I discovered that three of the four data sets had problems with severe noise and the fourth had missing values where someone had previously edited out the bad points. Turning a vice into a virtue, I realized that no guide to data fitting is satisfactory without a collection of useful tools for dealing with very noisy data.

I tried to iteratively reweight residuals by the inverse of the observed residual variance. Unfortunately, such reweighting is an art that is difficult to teach, difficult to reuse, and often fails. Iterative reweighting encourages ad-hoc solutions that defy the formal optimization literature. However, iterative reweighting can hardly be avoided since real data is noisy and its variance is usually nonstationary. We should be able to greatly reduce the need for ad hoc reweighting if we use robust methods.

Theoretically, it is well known (FGDP) that the L1 norm approach allows for a data set containing some infinite data values (noise). A straightforward approach to geophysical inversion (model fitting or regression) is a mixed L1 and L2 criterion for fitting data $\bold d$ by a model $\bold m$ where  
 \begin{displaymath}
\min_{\bold m} \quad
 (\sum_{n_d} \vert\bold F\bold m-\bold d\vert)
 +
 \bold m' \bold A' \bold A\bold m\end{displaymath} (1)
and where $\bold F$ links a model to its implied data, and $\bold A$ is a model roughener. If we had a rapid solution method to equation (1), we would probably use it and be quite happy. Unfortunately, we do not have such a method.

I did some preliminary tests on a regression like equation (1) (but all L1). I was disappointed that twenty times more iterations than usual were required and each iteration was significantly more costly.[*], From this preliminary experience I sought an approach that would be nearly equal to our customary L2 approach, but which would adapt itself to a moderate number of large noise values. The gentler approach I adopt here is based on the criteria of Huber (a professor of statistics at Harvard). Residuals are penalized by their square when they are small (L2) and by their absolute value when they are large (L1).

We will see that with the Huber approach, the conjugute-direction method is only slightly changed. A big question is whether the convergence rate remains attractive. This is important for geophysical problems which are usually large.


previous up next print clean
Next: HUBER FUNCTION REGRESSION Up: Claerbout: CG Huber regression Previous: Claerbout: CG Huber regression
Stanford Exploration Project
11/12/1997