next up previous print clean
Next: Constructors and destructors Up: R. Clapp: MPI in Previous: Collection

Automatic Parallelization

The library routines described earlier are effective in many situations. They become difficult to use when the distribution requirements become complex and inefficient when IO transfer times dominate processing times. Another set of routines is available when facing a task for which the first set of routines does not provide an effective solution.

The idea of this set of routines is twofold. First, a tag is distributed along some axis in blocks. This distribution can take two forms: sequential (SEQUENTIAL) and combined (COMBINED). Figure [*] illustrates the two distribution patterns. The second principle is that there will be three thread types. The master thread will distribute and collect data. Each slave processor will then run two threads simultaneously. The first will simply be concerned with receiving and sending IO to the master node. The second will be processing the data. IO threads will notify its processing thread whenever it has finished reading in a block. The processing thread will then read in its local input, process the data and notify the IO thread that it is safe to transfer back the information to the master node. This will generally reduce the processing time from

ttot = 2 * ti + to + tproc



ttot = min( ti to , tproc) + tedge


where ttot is the total processing time, ti is the time to send the input, to is the time to send the output, tedge is the time to transfer first input and last output block, and tproc is the total processing time. This methodology is most effective if the master thread can handle higher bandwidth than the slave threads.

The routines to do MPI in this manner can be broken into four portions. First, constructors and destructors, which begin and end the MPI processing and describe how the data is distributed. The second batch of functions provide information to the programmer about what a given thread will be doing. The final two groups are specific to processing and IO threads.

Figure 1
The different distribution patterns. The top-left panel shows a dataset distributed into 4 parts along the first axis (you would normally never distribute along the first axis but the argument holds true for any axis distribution < ndim) . In the top-right panel note that sections A and C go to processor 1 while B and D go to processor 2. The bottom two panels show the two possible distribution patterns. In the combined distribution the blocks retain their normal relationship. In the sequential distribution the processed blocks are written sequentially.

next up previous print clean
Next: Constructors and destructors Up: R. Clapp: MPI in Previous: Collection
Stanford Exploration Project