The memory layout for processing time slices is illustrated
in Figure . The time slices are local to each processor
and the time axis is the parallel dimension.
Thus, the number of processors needed is the number of time samples,
and the memory of each processor must be large enough to load and
process one time slice. When the amount of data exceeds the memory
available, the process can run on lumps of data. The lumps may
be pieces of data cut in the (*x*,*y*) space, preserving the parallel axis
and relieving the processors memory, or they may be cut in time,
shortening the parallel axis but requiring a smaller area of overlap
between successive lumps.

tsproc
The top-most drawing
represents the elliptic dip-limited DMO operator.
I assume that the offset line bisects the Figure 3 x- and y- axes
on the Earth's surface. Below, the two grids represent the
data layout inside the processors. Each processor contains
a time slice of data. Processor 1 contains the time slice
at t=t and performs the data communication across the (_{0}x,y)
space that corresponds to a vertical shift from t to _{0}t.
During this time, processor 2 performs the same kind of operation
for a vertical shift from _{1}t to _{1}t. Then, processor 1
sums its output in the output volume and communicates its input
to processor 2. The action is then repeated, moving the data
across the (_{2}x,y) space for a vertical shift from t to _{0}t.
_{2} |

Unlike the previous algorithm, data communication is performed
in a single direction, up the time axis, and does not depend on
the offset and azimuth distribution. The long ranging and chaotic
communications in the (*x*,*y*) space take place in-processor.
Thus, the processing of time slices results in a more efficient
inter-processor communication than the trace processing.

11/16/1997