|ATWkA.xk+1 = ATWk.y||(6)|
Wk has the role of a weighting matrix, the inverse of a covariance matrix. By giving the large residuals small weights (or large variances), it reduces their role; on the contrary, well predicted values will be given large weights (small variances). Aberrant values, receiving low weights, will have a small influence: this makes the algorithm robust, less sensitive to noise bursts in the data. This is especially true for the extreme case p=1, as it had been predicted by Claerbout and Muir (1973).
However, the L1 norm is not strictly convex; so, the L1 minimization problem might have non-unique solutions. The IRLS algorithm will lead us to one of these solutions. The fact that the first iteration (W0=In) is a least-squares inversion makes me think that this algorithm will lead us to a solution close to the L2 solution. The choice of W0(i)=|y(i)|p-2 is not reasonable: during the algorithm, x stays close to 0, and the algorithm converges (if it does) to a vector also close to zero, the residuals staying close to the original vector y.