How To Solve a Bivariate L-1 Norm Regression
by Jon 6-22-2009

My last lecture claimed that once we had a bivariate L1 solver we could pretty much stick it in cg_step to get a multivariate L1 solver. Here I present the required low priced exact bivariate solver.

This is important because "everybody" uses cg_step. And since they do, we'll have a huge number of test cases. Hooray! I think this will be much better than IRLS because

We'll need test cases to see if we are achieving my claims. With small test cases we'll see if L1 produces the step functions we need to represent a "compartmentalized" earth. We really, really want to know that, and get some experience with it to understand its potentialities and limitations. So far we envision (1) the Jordan River test case and (2) the Interval Velocity test case, both synthetic (not real).

The L1 world naturally deals in exact solutions. The solution is exact in a problem of M variables in the sense that the descent leads to M equations being exactly satisfied. The multivariate L1 algorithm knows when it gets there. We can continue to give "niter" as an upper bound on the number of iterations, but the iteration will stop earlier if it arrives at the answer. It will be interesting to see how often this happens. If precision becomes an issue, you can always restart. In fact, once a solution is obtained, you might routinely restart to see if basis equations stay fixed, evidence that precision is not an issue.

Getting started

I assume now from my previous lecture, you know everything about the weighted median and its solvers, and why we need a bivariate solver (because for each iteration of a multivariate solver we need to search the plane spanned by the gradient and the previous step). Here it comes now, the exact solution to the bivariate solver in a fairly small number of steps.

This is a bivariate optimization program embedded in a multivariate optimization program. I'll not review the multivariate problem (see GEE), but when I write

0 \approx SUM_i | g_i a + s_i b - r_i |

you'll know where the g_i, s_i, and r_i have come from, and you'll suspect in that world above, a=alpha and b=beta.

For my first iteration in this bivariate program I'll start from (a,b)=(0,0) updating only a. Later iterations I'll update both a and b. Using my weighted median code, I'll solve this L1 regression

0 \approx SUM_i | gamma g_i da - r_i |

giving me a value for gamma, and a j-th equation that fits exactly. Then we set (a,b)= gamma*(da,0) and we update the residuals r_i.

In subsequent iterations both (a,b) will update. Suppose the j-th equation now fits exactly and we are ready to do another gamma scan to find a perfectly fitting k-th equation. How can we do this?

At (a,b) the j-th equation now fits exactly so its residual vanishes, r_j=0. Can we maintain that exact fit during a new scan? Yes, if we keep our next da and db in a certain ratio. What ratio must that be? The perfect fitting equation is

g_j da + s_j db = 0

so that required ratio is (da,db) proportional to (s_j,-g_j). Thus we scan (da,db) = newgamma*(s_j,-g_j) over newgamma. This means we create a column

c_i = g_i s_j - s_i g_j

and run the weighted median solver on

0 \approx SUM_i | c_i *newgamma - r_i|

getting newgamma and a new perfect fitting equation, we'll call the k-th equation. On our next iteration, we'll keep that new k-th one and drop the requirement of fitting the old j-th equation. The next iteration proceeds like the previous, each stage updating (a,b) and the residuals. Eventually we find ourselves not moving, newgamma=0 which means we are finished. Lastly we pass (a,b) back to the multivariate solver to do one of its iterations.

A geometrical interpretation of the iteration process is intriguing, leading to proofs and speculations. Throw down many random straight lines in the (a,b)-plane. The number of lines is the number of residuals, r_i. Scanning gamma is moving along a line ultimately chosing the best line to cross over to for the next scan. Since we always leave a line at the best location (lowest summed residual) we can never reenter at a higher location. That proves we cannot go on forever. The maximum number of lines we could have to scan would be all of them. Hopefully we'd never be that unlucky! I expect we'll have fun plotting some of these descents.

One problem I have not mentioned is the problem of a flat bottom, such as the median of an even number of points. We are unlikely to see flat bottoms on equations involving real data, but regularization regression equations, like a vanishing derivative of model space definately will give flat bottoms. The way to explore these is with our old "geostat" trick, solve the problem many times each with a different noise added to the regularization. Flat bottoms require a little attention in the coding, but nothing I could not deal with 36 years ago in the code written then which we now have (in FORTRAN 66). That multivariate code is only 264 lines, including a test program, so a bivariate code should not be too hard for today's graduate students, especially since the weighted median solver (half the code) is already done for them.

The replacement for cgstep(), let us call it l1step().