This section concerns the implementation of a 3-D prestack time migration velocity semblance analysis on the CM. Velocity semblance panels are computed for each desired surface location. A migration divergence correction, obliquity weighting, vertical smoothing and time slice normalization are incorporated.

The algorithm accepts seismic trace data organized in any arbitrary manner.
These are stored on the CM in a 2-D array: `data(nt,ntrc)`, where
`nt` is the number of time samples per trace, and `ntrc` is the number of
traces from a 2-D or 3-D survey to be loaded into a data block for processing.
The `data` array is mapped out on the CM as parallel in the surface coordinate
dimension, and serial in the temporal direction: `data(:serial,:news)`.

The output is organized into velocity semblance panels per surface location.
These are stored on the CM in a 2-D array: `vpanels(ntau,nvtrc)`, where
`ntau` is the number of output time samples per velocity trace (usually
about `nt`/10). The dimension `nvtrc` is the total number of output
velocity traces, which can be expressed as `nvtrc` = `nv``nx``ny`,
where `nv` is
the number of migration velocities per semblance panel, and `nx`, `ny` are the
number of desired output velocity panels at 3-D in-line and cross-line
surface locations respectively.
The `vpanels` array is mapped out on the CM as parallel in the surface
coordinate dimension, and serial in the temporal direction:
`vpanels(:serial,:news)`.

To optimize CM performance, the products `ntrc` and `nvtrc` should be equal,
and a multiple of 512. As in the stacking velocity case,
their equality results in a 1:1 overlap of the
input and output arrays `data` and `vpanels`, which is optimal for CM
processor communications, and the factor of 512 ensures compiler
parallelization optimality. For the marine data example in this paper,
which consisted of a series of 48-fold CMP gathers,
I chose `ntrc` = 4832 = 1,536 input traces.
In practical terms, the constraint on `ntrc` is
determined by the output dimension length `nvtrc`. For the marine data,
I chose `nv` = 48, `nx` = 32 (32 48-fold CMP gathers),
and `ny` = 1 (2-D seismic data). In general,
I would suggest making `nvtrc` as large as possible while still
a multiple of 512. This involves deciding how many surface positions
`nx` and `ny` at which to do a velocity analysis, and how many migration
velocities `nv` to try at each surface position. Then I would set the number
of input traces per data block equal to the total number of velocity traces:
`ntrc` = `nvtrc`. Since the input seismic trace length is usually about 10 times
longer than the output velocity trace length, some trade-off in total
memory available has to be balanced in choosing these parameters.

Deviation from these rules of thumb can result in a significant loss of performance. For a small memory machine like our present CM-2 configuration (8k processors, 64 Mb memory), I could process the entire data set in single blocks of 6,144 arbitrarily gathered traces, at 2,000 samples per trace, by sequentially loading data blocks and iterating the velocity analysis processing sequence. This would yield an output velocity analysis at each of 128 surface positions, each semblance panel containing 48 migration velocities and 256 time samples.

The CM migration velocity analysis algorithm is based on a
3-D Kirchhoff prestack time migration algorithm previously developed
for the CM (Lumley and Biondi, 1991).
For each data block, the migration velocity semblance panels are computed
as follows. Each input trace *s*_{i} is migrated into a single surface
location *x*_{i} at a single migration velocity *v*_{i}.
This step is accomplished in parallel over
the trace dimension for a fixed time slice, and serially over all time slices.
Once this step is complete, the output array `vpanels` and the velocity and
output coordinate information are `CSHIFT`ed once over the parallel dimension,
a shift that is relative to the input traces.
After the shift, a seismic input trace *s _{1}* that was aligned in-processor
with velocity trace

This process is repeated until
all traces are back in their initial configuration, which requires
`nh` circular shifts. Again, the input data could be shifted, rather
than the semblance panels, but it is quicker to shift the smaller of the
two arrays to minimize interprocessor communication. Finally, an array
of squared velocity migration stack energy is maintained to compute semblance,
and offset weights. Divergence corrections, vertical smoothing
and time-slice normalization are incorporated which complicate the
simplistic description given above.

11/18/1997