** Next:** CONCLUSIONS
** Up:** Gilles Darche: L Deconvolution
** Previous:** EXAMPLE ON REAL DATA

Though the theoretical problem is a *L*^{p} norm minimization, the introduction of
the parameter changes the theoretical resolution. In fact, the limit
*x*_{lim} of the algorithm will solve the normal equations:
| |
(10) |

where:
*r*(*i*) = (*y* - *A*.*x*)(*i*)

| |
(11) |

These equations are the normal equations of the next minimization problem:
| |
(12) |

Effectively, if , a small perturbation of *x* will
maintain this inequality, because of the continuity of the matricial product.
This property also holds for the other inequality: .It is no longer true if we take large inequalities ( or ).
So, we can calculate the gradient of the objective function in equation (12)
and keep the same inequalities; if I call this function , then:

| |
(13) |

Finally, replacing *r*(*i*) by (*y* - *A*.*x*)(*i*), we find the normal equations (10).
This result is true if the objective function is minimized for a vector *x* such that none of the residuals |*r*(*i*)| is
equal to .
So, the IRLS algorithm minimizes what would be called a *mixed* function.
But this objective function is not convex: the summation
of any two vectors *x* and *x*' can completely modify the repartition between
residuals larger or smaller than if these vectors are not close
to each other. In fact, this function is strictly convex inside the regions
limited by the 2**n*_{y} hyperplanes:

and it is not continuous on these hyperplanes. On
Figure , I plotted for example the function:
for |*x*_{i}|<4, obtained with *p*=1 and =2. This kind of function
has one global minimum, and no other local minimum. However, if we cross the
hyperplanes to reach the region of the global minimum, there might be problems
of computation of the gradient near these hyperplanes, and so problems of
convergence. This problem does not appear with the IRLS algorithm and the
*L*^{1} norm, because is small: the central region is smaller,
and the discontinuity (whose jump is proportional to ) becomes
negligible. In fact, the fewer hyperplanes are crossed during the algorithm,
the more efficient the convergence will be. Thus, it is better to have
from the beginning as many residuals as possible smaller than ;that is why the use of a first estimate of the filter enhances the convergence,
and too small an hinders it.
There is in fact another reason which explains the slowness of the convergence
of the IRLS algorithm for a small . A *theoretical *
*n*-component *L*^{1} solution must produce at least *n* residuals equal to
zero. But during the algorithm, if any of the residuals gets
smaller than , it creates a weight for the
next iteration. If is small, this weight will be relatively
large, and will force the same residual, at the next iteration, to be smaller
than . Thus, the algorithm tries to keep this residual small,
independently of whether or not it belongs to the set of theoretical zero
residuals, and we might need many iterations to force the algorithm to
``forget'' this residual.

The objective function could be interesting for a mixed
*L*^{1}-*L*^{2} problem, for example, if were taken large enough,
in the minimization problem:

| |
(14) |

The convergence would be ensured by the proximity of the *L*^{2} minimization
problem. Problems of computation of the gradient on the hyperplanes could be
solved by smoothing the objective function near these hyperplanes.
Such a smoothing could be assured by taking:
but this choice would not actually correspond to the same minimization problem.
An intermediate choice would be to assure only the continuity of the objective
function by taking:
The *mixed* function of the problem (14) could handle two kinds
of noise at the same time: spiky high-amplitude noise with the *L*^{1} norm,
and gaussian noise with the *L*^{2} norm and the damping factor .

** Next:** CONCLUSIONS
** Up:** Gilles Darche: L Deconvolution
** Previous:** EXAMPLE ON REAL DATA
Stanford Exploration Project

1/13/1998