Nonlinear L.S. conjugate-direction template

Next: Minimizing the Cauchy function Up: MEANS, MEDIANS, PERCENTILES AND Previous: Multivariate L estimation by

Nonlinear L.S. conjugate-direction template

Nonlinear optimization arises from two causes:

1.: Nonlinear physics. The operator depends upon the solution being attained.
2.: Nonlinear statistics. We need robust estimators like the L₁ norm.

The computing template below is useful in both cases. It is almost the same as the template for weighted linear least-squares except that the residual is recomputed at each iteration. Starting from the usual weighted least-squares template we simply move the iteration statement a bit earlier.

where ${\bf diag}[w(\bold r)]$ is whatever weighting function we choose along the diagonal of a diagonal matrix.

Now let us see how the weighting functions relate to robust estimation: Notice in the code template that $\bold W$ is applied twice in the definition of $\bold \Delta\bold m$ .Thus $\bold W$ is the square root of the diagonal operator in equation (13).

$\begin{displaymath} \bold W \eq \ {\bf diag} \left( {1\over\sqrt{\sqrt{1+r_i^2/\bar r^2}}} \right)\end{displaymath}$ (14)

Module weight_solver implements the computational template above. In addition to the usual set of arguments from the solver() subroutine , it accepts a user-defined function (parameter wght) for computing residual weights. Parameters nmem and nfreq control the restarting schedule of the iterative scheme.

module weight_solver { 
  logical, parameter, private  :: T = .true., F = .false.
contains
  subroutine solver( oper, solv, x, dat, niter, nmem, nfreq, wght) {
    interface {
       integer function wght( res, w) {
           real, dimension (:)  :: res, w
           }
       integer function oper( adj, add, x, dat) {
           logical, intent (in) :: adj, add
           real, dimension (:)  :: x, dat
           }
       integer function solv( forget, x, g, rr, gg) {
           logical             :: forget
           real, dimension (:) :: x, g, rr, gg
           }
       }
    real, dimension (:), intent (in)  :: dat                # data
    real, dimension (:), intent (out) :: x                  # solution
    integer,             intent (in)  :: niter, nmem, nfreq # iterations
    real, dimension (size (x))        :: g                  # gradient
    real, dimension (size (dat))      :: rr, gg, wt         # res, CG, weight
    integer                           :: i, stat
    logical                           :: forget
    rr = -dat;  x = 0.;  wt = 1.;  forget = F         # initialize
    do i = 1, niter {
       forget = (i > nmem)                            # restart
       if( forget) stat = wght( rr, wt)               # compute weighting
       rr = rr * wt                                   # rr = W (Fx - d)
       stat = oper( T, F,  g, wt*rr)                  # g  = F' W' rr 
       stat = oper( F, F,  g, gg)                     # gg = Fg
       gg = gg * wt                                   # gg = W F g
       if( forget)  forget = ( mod( i, nfreq) == 0)   # periodic restart
       stat = solv( forget, x, g, rr, gg)             # step in x and rr
       rr = - dat
       stat = oper( F, T, x, rr)                      # rr = Fx - d
    }
  }
}

We can ask whether cgstep(), which was not designed with nonlinear least-squares in mind, is doing the right thing with the weighting function. First, we know we are doing weighted linear least-squares correctly. Then we recall that on the first iteration, the conjugate-directions technique reduces to steepest descent, which amounts to a calculation of the scale factor $\alpha$ with

$\begin{displaymath} \alpha \eq -\ { \Delta \bold r \cdot \bold r \over \Delta \bold r \cdot \Delta\bold r }\end{displaymath}$ (15)

Of course, cgstep() knows nothing about the weighting function, but notice that the iteration loop above nicely inserts the weighting function both in $\bold r$ and in $\Delta \bold r$ ,as required by (15).

Experience shows that difficulties arise when the weighting function varies rapidly from one iteration to the next. Naturally, the conjugate-direction method, which remembers the previous iteration, will have an inappropriate memory if the weighting function changes too rapidly. A practical approach is to be sure the changes in the weighting function are slowly variable.

Next: Minimizing the Cauchy function Up: MEANS, MEDIANS, PERCENTILES AND Previous: Multivariate L estimation by

Stanford Exploration Project
2/27/1998