next up previous print clean
Next: Minimizing the Cauchy function Up: MEANS, MEDIANS, PERCENTILES AND Previous: Multivariate L estimation by

Nonlinear L.S. conjugate-direction template

Nonlinear optimization arises from two causes:

1.
Nonlinear physics. The operator depends upon the solution being attained.
2.
Nonlinear statistics. We need robust estimators like the L1 norm.
The computing template below is useful in both cases. It is almost the same as the template for weighted linear least-squares except that the residual is recomputed at each iteration. Starting from the usual weighted least-squares template we simply move the iteration statement a bit earlier.

		 iterate { 
		 		 $\bold r \quad\longleftarrow\quad\bold F \bold m - \bold d$ 
		 		 $\bold W \quad\longleftarrow\quad{\bf diag}[w(\bold r)]$ 
		 		 $\bold r \quad\longleftarrow\quad\bold W \bold r $ 
		 		  $\Delta\bold m \quad\longleftarrow\quad\bold F'\bold W'\ \bold r$ 
		 		  $\Delta\bold r\ \quad\longleftarrow\quad\bold W \bold F \ \Delta \bold m$ 
		 		  $(\bold m,\bold r) \quad\longleftarrow\quad{\rm cgstep}
 (\bold m,\bold r, \Delta\bold m,\Delta\bold r )$ 
		 		 } 
where ${\bf diag}[w(\bold r)]$ is whatever weighting function we choose along the diagonal of a diagonal matrix.

Now let us see how the weighting functions relate to robust estimation: Notice in the code template that $\bold W$ is applied twice in the definition of $\bold \Delta\bold m$.Thus $\bold W$ is the square root of the diagonal operator in equation (13).  
 \begin{displaymath}
\bold W \eq \ {\bf diag} \left( {1\over\sqrt{\sqrt{1+r_i^2/\bar r^2}}} \right)\end{displaymath} (14)

Module weight_solver [*] implements the computational template above. In addition to the usual set of arguments from the solver() subroutine [*], it accepts a user-defined function (parameter wght) for computing residual weights. Parameters nmem and nfreq control the restarting schedule of the iterative scheme.  

module weight_solver { 
  logical, parameter, private  :: T = .true., F = .false.
contains
  subroutine solver( oper, solv, x, dat, niter, nmem, nfreq, wght) {
    interface {
       integer function wght( res, w) {
           real, dimension (:)  :: res, w
           }
       integer function oper( adj, add, x, dat) {
           logical, intent (in) :: adj, add
           real, dimension (:)  :: x, dat
           }
       integer function solv( forget, x, g, rr, gg) {
           logical             :: forget
           real, dimension (:) :: x, g, rr, gg
           }
       }
    real, dimension (:), intent (in)  :: dat                # data
    real, dimension (:), intent (out) :: x                  # solution
    integer,             intent (in)  :: niter, nmem, nfreq # iterations
    real, dimension (size (x))        :: g                  # gradient
    real, dimension (size (dat))      :: rr, gg, wt         # res, CG, weight
    integer                           :: i, stat
    logical                           :: forget
    rr = -dat;  x = 0.;  wt = 1.;  forget = F         # initialize
    do i = 1, niter {
       forget = (i > nmem)                            # restart
       if( forget) stat = wght( rr, wt)               # compute weighting
       rr = rr * wt                                   # rr = W (Fx - d)
       stat = oper( T, F,  g, wt*rr)                  # g  = F' W' rr 
       stat = oper( F, F,  g, gg)                     # gg = Fg
       gg = gg * wt                                   # gg = W F g
       if( forget)  forget = ( mod( i, nfreq) == 0)   # periodic restart
       stat = solv( forget, x, g, rr, gg)             # step in x and rr
       rr = - dat
       stat = oper( F, T, x, rr)                      # rr = Fx - d
    }
  }
}

We can ask whether cgstep(), which was not designed with nonlinear least-squares in mind, is doing the right thing with the weighting function. First, we know we are doing weighted linear least-squares correctly. Then we recall that on the first iteration, the conjugate-directions technique reduces to steepest descent, which amounts to a calculation of the scale factor $\alpha$ with  
 \begin{displaymath}
\alpha \eq -\ { 
 \Delta \bold r \cdot \bold r
 \over
 \Delta \bold r \cdot \Delta\bold r
 }\end{displaymath} (15)
Of course, cgstep() knows nothing about the weighting function, but notice that the iteration loop above nicely inserts the weighting function both in $\bold r$ and in $\Delta \bold r$,as required by (15).

Experience shows that difficulties arise when the weighting function varies rapidly from one iteration to the next. Naturally, the conjugate-direction method, which remembers the previous iteration, will have an inappropriate memory if the weighting function changes too rapidly. A practical approach is to be sure the changes in the weighting function are slowly variable.


next up previous print clean
Next: Minimizing the Cauchy function Up: MEANS, MEDIANS, PERCENTILES AND Previous: Multivariate L estimation by
Stanford Exploration Project
2/27/1998