The idea of this set of routines is twofold. First, a tag is distributed along some axis in blocks. This distribution can take two forms: sequential (SEQUENTIAL) and combined (COMBINED). Figure illustrates the two distribution patterns. The second principle is that there will be three thread types. The master thread will distribute and collect data. Each slave processor will then run two threads simultaneously. The first will simply be concerned with receiving and sending IO to the master node. The second will be processing the data. IO threads will notify its processing thread whenever it has finished reading in a block. The processing thread will then read in its local input, process the data and notify the IO thread that it is safe to transfer back the information to the master node. This will generally reduce the processing time from
|ttot = 2 * ti + to + tproc||(1)|
|ttot = min( ti to , tproc) + tedge||(2)|
The routines to do MPI in this manner can be broken into four portions. First, constructors and destructors, which begin and end the MPI processing and describe how the data is distributed. The second batch of functions provide information to the programmer about what a given thread will be doing. The final two groups are specific to processing and IO threads.
Figure 1 The different distribution patterns. The top-left panel shows a dataset distributed into 4 parts along the first axis (you would normally never distribute along the first axis but the argument holds true for any axis distribution < ndim) . In the top-right panel note that sections A and C go to processor 1 while B and D go to processor 2. The bottom two panels show the two possible distribution patterns. In the combined distribution the blocks retain their normal relationship. In the sequential distribution the processed blocks are written sequentially.