previous up next print clean
Next: Discussion Up: IMPLEMENTATION ON THE CONNECTION Previous: Parallel computation of the

Parallel solution by the conjugate gradient method

The coordinate and boundary information is no longer needed, once the operator has been constructed and boundary conditions have been applied. All that remains is to solve the system of linear equations. This can be efficiently done by the CG method. The CM code for the iterations of the CG algorithm is almost a one-to-one match with the algorithm outline given in Equation 1. This is one of the more attractive features of CM-Fortran, there is none of the extra clutter of do loops to obscure the meaning of the code. Figure [*] shows the conjugate gradient loop in its entirety.

DO WHILE( NORMRATIO > ERRLIM )

C Distribute the step information to all the nodes. CALL DISTRIBUTE( STEP, LOCSTEP, NNODEX, NNODEY )

C apply the operator CONJSTEP = 0. DO I =1,9 CONJSTEP = CONJSTEP + OPERATOR(I,:,:) * LOCSTEP(I,:,:) END DO

C calculate alpha ALPHA = PREVNORM / SUM( STEP * CONJSTEP )

C update soln SOLN = SOLN + ALPHA * STEP

C update residuals RESID = RESID - ALPHA * CONJSTEP

C calculate norm of residuals and beta RESIDNORM = SUM( RESID*RESID ) BETA = RESIDNORM/PREVNORM

C update the step STEP = RESID + BETA * STEP

C calculate relative error in residuals. NORMRATIO = RESIDNORM/RHSNORM

PREVNORM = RESIDNORM

END DO


CM-Fortran code for the CG loop
The most important point about this code is that there is only one distribution step per iteration. This is the minimum possible amount of communication for the CG algorithm. The only other interprocessor communication is in the ``SUM'' operation which is used to calculate the inner products.

The layout of the arrays and the coding of the distribution routine depend on the connectivity of the nodes. In the model problem the nodes can be mapped to a 2-D array and the distribution can be performed as a series of shift operations. Figure [*] shows the CM code to define the array layout and Figure [*] the routine ``DISTRIBUTE''.

REAL OPERATOR(9,NNODEX,NNODEY) REAL LOCSTEP(9,NNODEX,NNODEY) CMF$LAYOUT OPERATOR(:SERIAL,:NEWS,:NEWS) CMF$LAYOUT LOCSTEP(:SERIAL,:NEWS,:NEWS)

REAL SOLN(NNODEX,NNODEY) REAL STEP(NNODEX,NNODEY) REAL RESID(NNODEX,NNODEY) REAL CONJSTEP(NNODEX,NNODEY) CMF$LAYOUT SOLN(:NEWS,:NEWS) CMF$LAYOUT RESID(:NEWS,:NEWS) CMF$LAYOUT STEP(:NEWS,:NEWS) CMF$LAYOUT CONJSTEP(:NEWS,:NEWS)


Definition of the arrays for a regular 2-D problem. The operator and the work vector for step information are serial vectors of nine points on each processor. All the other arrays have one element per processor.

SUBROUTINE DISTRIBUTE( INPUT, OUTPUT, NNODEX, NNODEY )

C Gather information from all connected nodes. C This can be done with shift operations. C INTEGER NNODEX, NNODEY

REAL OUTPUT(9,NNODEX,NNODEY) CMF$LAYOUT OUTPUT(:SERIAL,:NEWS,:NEWS)

REAL INPUT(NNODEX,NNODEY) CMF$LAYOUT INPUT(:NEWS,:NEWS)

OUTPUT(5,:,:) = INPUT

OUTPUT(4,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=1, SHIFT=-1 ) OUTPUT(1,:,:) = CSHIFT( OUTPUT(4,:,:), DIM=2, SHIFT=-1 ) OUTPUT(7,:,:) = CSHIFT( OUTPUT(4,:,:), DIM=2, SHIFT=1 )

OUTPUT(6,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=1, SHIFT=1 ) OUTPUT(3,:,:) = CSHIFT( OUTPUT(6,:,:), DIM=2, SHIFT=-1 ) OUTPUT(9,:,:) = CSHIFT( OUTPUT(6,:,:), DIM=2, SHIFT=1 )

OUTPUT(2,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=2, SHIFT=-1 ) OUTPUT(8,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=2, SHIFT=1 )

RETURN END


Routine to distribute step information to all processors for a 2-D mesh.

Table 7 shows the timings for the model problem. The first set of figures is for the code given in Figure [*]. The second set of figures is for a version of the code in which the distribution and application steps were replaced by a call to a routine compiled with the stencil compiler. This is a special purpose compiler for compiling algorithms that can be coded as a stencil operation. Biondo Biondi used a pre-release version of this compiler to create the routine that I used.

There are several points to note:


  
Figure 7: Timing figures for the model problem.
\begin{figure}
\centerline{
\begin{tabular}
{\vert\vert l\vert r\vert r\vert r\v...
 ...es512$\space & .22 & 12.7 & 792 & 276 \\ \hline \hline\end{tabular}}\end{figure}


previous up next print clean
Next: Discussion Up: IMPLEMENTATION ON THE CONNECTION Previous: Parallel computation of the
Stanford Exploration Project
12/18/1997