Next: Conclusions Up: IMPLEMENTATION ON THE CONNECTION Previous: Parallel solution by the

## Discussion

The speeds obtained here only apply to regularly connected meshes, I would expect irregular meshes to run at a slower speed because of the higher cost of the distribution step. There are two alternatives that could be considered for irregularly connected meshes. Use a 1-D array of nodes and distribute data using a global indexing scheme. It was suggested by Mathur (1990) that if this approach is used the indexing arrays should be randomly constructed so that global communication conflicts are reduced. Another approach would be to map the irregular mesh onto a regular 2-D mesh with some elements missing. There would be a computational overhead proportional to the number of missing nodes, but the overall cost might be reduced because of the speed of the regular communications. This idea is illustrated in Figure .

irreg
Figure 4
Construction of an enlarged computational mesh to ensure regular connections between nodes.

If, as in the model problem, the grid and the domain are both rectangular, the operator produced by the FE method is similar to a finite difference operator. For this type of very regular problem it is probably wiser to use a finite difference method. However, the operator calculation code that I implemented does not assume this rectangular layout. The elements must be connected in a regular mesh, but they may have arbitrary shapes. This case is related to the conformally mapped finite difference scheme described by Fornberg (1989). Each method has some advantages. More time is spent in constructing the operator in the FE code, but boundary conditions are generally much easier to implement.

One common procedure for speeding up the CG method is to use a preconditioning operator. A diagonal preconditioner is obviously very easy to apply. If better preconditioners are desired the method depends on the layout of the data. If the global Galerkin operator is constructed, an incomplete Choleski preconditioner can be used. When the unassembled system consists of element matrices the preconditioning is usually done with element-by-element operators (Lee and Wathen, 1989). This is not simple to do when using the row-wise decomposition that I propose, but I believe that it should be possible to construct an efficient preconditioning operator with the row-wise data layout.

Next: Conclusions Up: IMPLEMENTATION ON THE CONNECTION Previous: Parallel solution by the
Stanford Exploration Project
12/18/1997