Currently datasets are collected only at the completion of the job. It would be useful to be able to concatenate at various time intervals. This would make QCing easier and make the job less susceptible to disks going bad, requiring the rerunning of portions of the job. The problem with implementing this feature is when the output file is of the COPY type. The file is being continually updated by processes running on the node. In order to get an accurate picture of what is collected all writing to the node would have to bet frozen and/or insure that no processes is running on the node. The second change is in the actual methodology of the collection. Currently a binary tree sum is done using the MPI_collect routine. For large datasets this approach quickly swamps the network switch. A more efficient collection method should be implemented.
Another feature would be to expand on the current COPY, BLOCK, and SPREAD methods. It would be useful to be able to distribute over multiple axes, and to have the distributed portions overlap (patching).
Finally, a couple changes to the way the machines are chosen needs to be made. Instead of the list of available machines being read from a file it would be useful to have it read from a global server. n this way you could have dynamic control over the number of nodes a job was running on. In addition in some special cases it is necessary to guarantee that jobs run on specific nodes. Wave equation migration velocity analysis Sava and Biondi (2003) is an example of this. The downward continued wave-field is pre-stored on the nodes. You must be able to insure that each frequency is sent to the node that contains the wave-filed at that frequency.
The object oriented way that the library was implemented makes these changes relatively easy. For example, to insure that a specific job is spent to a specific node would involve inheriting the parallel job class, and overriding the routine that matches jobs with available machines.