Multivariate estimation by iterated reweighting

Next: Nonlinear L.S. conjugate-direction template Up: MEANS, MEDIANS, PERCENTILES AND Previous: Weighted L.S. conjugate-direction template

Multivariate $\ell^1$ estimation by iterated reweighting

L1 or $\ell^1$ L2 or $\ell^2$ The easiest method of model fitting is linear least squares. This means minimizing the sums of squares of residuals ( $\ell^2$ ). On the other hand, we often encounter huge noises and it is much safer to minimize the sums of absolute values of residuals ( $\ell^1$ ). (The problem with $\ell^0$ is that there are multiple minima, so the gradient is not a sensible way to seek the deepest).

There exist specialized techniques for handling $\ell^1$ multivariate fitting problems. They should work better than the simple iterative reweighting outlined here.

A penalty function that ranges from $\ell^2$ to $\ell^1$ ,depending on the constant $\bar r$ is

$\begin{displaymath} E(\bold r) \eq \sum_i \left( \sqrt{1+r_i^2/\bar r^2} - 1 \right)\end{displaymath}$ (9)

Where $r_i/\bar r$ is small, the terms in the sum amount to $r_i^2/2\bar r^2$ and where $r_i^2/\bar r^2$ is large, the terms in the sum amount to $\vert r_i/\bar r\vert$ .We define the residual as

$\begin{displaymath} r_i \eq \sum_j \ F_{ij}m_j - d_i\end{displaymath}$ (10)

We will need

$\begin{displaymath} {\partial r_i\over\partial m_k} \eq \sum_j \ F_{ij} \delta_{jk} \eq F_{ik}\end{displaymath}$ (11)

where we briefly used the notation that $\delta_{jk}$ is 1 when j=k and zero otherwise. Now, to let us find the descent direction $\Delta \bold m$ ,we will compute the k-th component g_k of the gradient $\bold g$ .We have

$\begin{displaymath} g_k \eq {\partial E \over\partial m_k} \eq \sum_i \ {1\over\... ...^2}}\ { r_i \over \bar r^2} \ {\partial r_i\over\partial m_k}\end{displaymath}$ (12)

$\begin{displaymath} \bold g \eq \Delta\bold m \eq \bold F' \ {\bf diag} \left( {1\over\sqrt{1+r_i^2/\bar r^2}} \right) \bold r\end{displaymath}$ (13)

where we have use the notation ${\bf diag}()$ to designate a diagonal matrix with its argument distributed along the diagonal.

Continuing, we notice that the new weighting of residuals has nothing to do with the linear relation between model perturbation and residual perturbation; that is, we retain the familiar relations, $\bold r = \bold F \bold m -\bold d$ and $\Delta\bold r = \bold F \Delta\bold m$ .

In practice we have the question of how to choose $\bar r$ .I suggest that $\bar r$ be proportional to ${\rm median}(\vert r_i\vert)$ or some other percentile.

Next: Nonlinear L.S. conjugate-direction template Up: MEANS, MEDIANS, PERCENTILES AND Previous: Weighted L.S. conjugate-direction template

Stanford Exploration Project
4/27/2004