The algorithm comprises 5 major routines:
The outer routines are parallel over time,
because Fast Fourier Transformation
as well as interpolation require communication between different time samples.
The inner routine is identical for each frequency plane.
The two-dimensional convolution
follows the outline proposed by Biondi 1991.
The offset k is laid out serial, because the convolution operator is
invariant along the h and axes, but depends on k according
to equation (8)
The data organization results in a very efficient code for parallel machines. But, it also requires a transpose step, in which the data layout has to be changed.
On the CM the intrinsic function 'reshape' transposes small data cubes. For data sets, which cannot reside in the CM's distributed memory the transpose could be implemented in the read and write from and to disk.