The coordinate and boundary information is no longer needed, once the operator has been constructed and boundary conditions have been applied. All that remains is to solve the system of linear equations. This can be efficiently done by the CG method. The CM code for the iterations of the CG algorithm is almost a one-to-one match with the algorithm outline given in Equation 1. This is one of the more attractive features of CM-Fortran, there is none of the extra clutter of do loops to obscure the meaning of the code. Figure shows the conjugate gradient loop in its entirety.
DO WHILE( NORMRATIO > ERRLIM )
C Distribute the step information to all the nodes. CALL DISTRIBUTE( STEP, LOCSTEP, NNODEX, NNODEY )
C apply the operator CONJSTEP = 0. DO I =1,9 CONJSTEP = CONJSTEP + OPERATOR(I,:,:) * LOCSTEP(I,:,:) END DO
C calculate alpha ALPHA = PREVNORM / SUM( STEP * CONJSTEP )
C update soln SOLN = SOLN + ALPHA * STEP
C update residuals RESID = RESID - ALPHA * CONJSTEP
C calculate norm of residuals and beta RESIDNORM = SUM( RESID*RESID ) BETA = RESIDNORM/PREVNORM
C update the step STEP = RESID + BETA * STEP
C calculate relative error in residuals. NORMRATIO = RESIDNORM/RHSNORM
PREVNORM = RESIDNORM
END DO
The layout of the arrays and the coding of the distribution routine depend on the connectivity of the nodes. In the model problem the nodes can be mapped to a 2-D array and the distribution can be performed as a series of shift operations. Figure shows the CM code to define the array layout and Figure the routine ``DISTRIBUTE''.
REAL OPERATOR(9,NNODEX,NNODEY) REAL LOCSTEP(9,NNODEX,NNODEY) CMF$LAYOUT OPERATOR(:SERIAL,:NEWS,:NEWS) CMF$LAYOUT LOCSTEP(:SERIAL,:NEWS,:NEWS)
REAL SOLN(NNODEX,NNODEY) REAL STEP(NNODEX,NNODEY) REAL RESID(NNODEX,NNODEY) REAL CONJSTEP(NNODEX,NNODEY) CMF$LAYOUT SOLN(:NEWS,:NEWS) CMF$LAYOUT RESID(:NEWS,:NEWS) CMF$LAYOUT STEP(:NEWS,:NEWS) CMF$LAYOUT CONJSTEP(:NEWS,:NEWS)
SUBROUTINE DISTRIBUTE( INPUT, OUTPUT, NNODEX, NNODEY )
C Gather information from all connected nodes. C This can be done with shift operations. C INTEGER NNODEX, NNODEY
REAL OUTPUT(9,NNODEX,NNODEY) CMF$LAYOUT OUTPUT(:SERIAL,:NEWS,:NEWS)
REAL INPUT(NNODEX,NNODEY) CMF$LAYOUT INPUT(:NEWS,:NEWS)
OUTPUT(5,:,:) = INPUT
OUTPUT(4,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=1, SHIFT=-1 ) OUTPUT(1,:,:) = CSHIFT( OUTPUT(4,:,:), DIM=2, SHIFT=-1 ) OUTPUT(7,:,:) = CSHIFT( OUTPUT(4,:,:), DIM=2, SHIFT=1 )
OUTPUT(6,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=1, SHIFT=1 ) OUTPUT(3,:,:) = CSHIFT( OUTPUT(6,:,:), DIM=2, SHIFT=-1 ) OUTPUT(9,:,:) = CSHIFT( OUTPUT(6,:,:), DIM=2, SHIFT=1 )
OUTPUT(2,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=2, SHIFT=-1 ) OUTPUT(8,:,:) = CSHIFT( OUTPUT(5,:,:), DIM=2, SHIFT=1 )
RETURN END
Table 7 shows the timings for the model problem. The first set of figures is for the code given in Figure . The second set of figures is for a version of the code in which the distribution and application steps were replaced by a call to a routine compiled with the stencil compiler. This is a special purpose compiler for compiling algorithms that can be coded as a stencil operation. Biondo Biondi used a pre-release version of this compiler to create the routine that I used.
There are several points to note: