Jon Claerbout and Dave Nichols
Author has no known email address
In our C++ project we plan to supply reusable least-squares solvers. This led to a question that we are only beginning to answer, ``are there any helpful general purpose preconditioners?'' The one described in this note applies to operators that are representable in partitioned form, such as the regression
We came up with two possibilities for rescaling before going to a conjugate-gradient (CG) solver. These two possibilities are:
We dislike (3) because we don't know how to specify the random numbers. We know the numbers should not be all ``1'' because d/dx and are important operators and their norm is clearly non zero whereas it might appear to be zero if the random numbers were taken to be all ``1''. On the other hand should the random numbers be all positive or both positive and negative? We are not sure we can make a prior specification that would be generally satisfactory. While positive random numbers might be safe enough, it is worth considering (2).
Can it really be that (2) contains an idea not already inside the CG solver? After all, the first step of CG computes the sum of these two terms. Apparently yes, there is an important idea missing from the CG solver which has no prior knowledge that the operator can be partitioned into the and parts.
Partitioned problems of the kind discussed here frequently arise from nonlinear optimization where there are two partial derivative matrices with different physical dimensions. Since the problems are nonlinear and the operators themselves change as the iteration proceeds, it is natural to think of changing the preconditioning scalars as the iteration proceeds. In the linear case, however, rescaling with iteration would mess up the CG solver. We asked Michael Saunders about the existence of nonlinear solvers for this type of problem, but he did not have any immediate leads.
An interesting close-to-home example of a partitioned regression problem is found in PVI page 182. This example is a one-dimensional missing data problem. There the vector contains the 3-5 components of a prediction-error filter (dimensionless transfer function) whereas the vector would have the typical 1-2 thousand time samples of a seismogram (units of pressure, for example). We have not tested the two alternative preconditioners suggested in this paper because we have not yet done the programming. Also, a more meaningful evaluation will arise in a multidimensional context where preconditioning is more important, and where more programming, and perhaps some more research, are required.
Two apparently related references are Kennett et al. (1988) and Golub and Varah (1974).