r(i) = (y - A.x)(i)
Effectively, if , a small perturbation of x will maintain this inequality, because of the continuity of the matricial product. This property also holds for the other inequality: .It is no longer true if we take large inequalities ( or ). So, we can calculate the gradient of the objective function in equation (12) and keep the same inequalities; if I call this function , then:
So, the IRLS algorithm minimizes what would be called a mixed function. But this objective function is not convex: the summation of any two vectors x and x' can completely modify the repartition between residuals larger or smaller than if these vectors are not close to each other. In fact, this function is strictly convex inside the regions limited by the 2*ny hyperplanes:
and it is not continuous on these hyperplanes. On Figure , I plotted for example the function:
for |xi|<4, obtained with p=1 and =2. This kind of function has one global minimum, and no other local minimum. However, if we cross the hyperplanes to reach the region of the global minimum, there might be problems of computation of the gradient near these hyperplanes, and so problems of convergence. This problem does not appear with the IRLS algorithm and the L1 norm, because is small: the central region is smaller, and the discontinuity (whose jump is proportional to ) becomes negligible. In fact, the fewer hyperplanes are crossed during the algorithm, the more efficient the convergence will be. Thus, it is better to have from the beginning as many residuals as possible smaller than ;that is why the use of a first estimate of the filter enhances the convergence, and too small an hinders it.
There is in fact another reason which explains the slowness of the convergence of the IRLS algorithm for a small . A theoretical n-component L1 solution must produce at least n residuals equal to zero. But during the algorithm, if any of the residuals gets smaller than , it creates a weight for the next iteration. If is small, this weight will be relatively large, and will force the same residual, at the next iteration, to be smaller than . Thus, the algorithm tries to keep this residual small, independently of whether or not it belongs to the set of theoretical zero residuals, and we might need many iterations to force the algorithm to ``forget'' this residual.
The objective function could be interesting for a mixed L1-L2 problem, for example, if were taken large enough, in the minimization problem:
but this choice would not actually correspond to the same minimization problem. An intermediate choice would be to assure only the continuity of the objective function by taking:
The mixed function of the problem (14) could handle two kinds of noise at the same time: spiky high-amplitude noise with the L1 norm, and gaussian noise with the L2 norm and the damping factor .