This section concerns the implementation of a stacking velocity semblance analysis on the CM. Velocity semblance panels are computed for each desired CMP bin location. A divergence correction, offset weighting, obliquity weighting, vertical smoothing and time slice normalization are incorporated.
The algorithm accepts seismic trace data organized into CMP gathers/bins. These are stored on the CM in a 3-D array: data(nt,nh,nm), where nt is the number of time samples per trace, nh is the number of offsets per CMP bin, and nm is the number of CMPs to be processed. The data array is mapped out on the CM as parallel in the offset and midpoint dimensions, and serial in the temporal direction: data(:serial, 1000:news,:news). The offset dimension is weighted by 1000, which instructs the CMF compiler to organize the offset traces in each CMP gather closely together in adjacent processors. This weight improves inter-processor communication during the CSHIFT operations, as discussed later.
The output is organized into velocity semblance panels per CMP location. These are stored on the CM in a 3-D array: vpanels(ntau,nv,nm), where ntau is the number of output time samples per velocity trace (usually about nt/10), and nv is the number of stacking velocities per panel. The vpanels array is mapped out on the CM as parallel in the velocity and midpoint dimensions, and serial in the temporal direction: vpanels(:serial,1000:news,:news). The velocity dimension is weighted by 1000 in order to organize the velocity traces in each semblance panel closely together in adjacent processors.
To optimize CM performance, the products nhnm and nvnm should be equal, and a multiple of 512. Their equality results in a 1:1 overlap of the input and output arrays data and vpanels, which is optimal for CM processor communications, and the factor of 512 ensures compiler parallelization optimality. For the marine data example in this paper, which consisted of a series of 48-fold CMP gathers, I chose nh = nv = 48, and nm = 32. For alternate fold data (e.g., 60-fold), I would choose nh to be the next higher power of two and pad each gather with zero traces. It is also very efficient to choose the number of stacking velocities nv to be equal to the fold nh, to keep the input and output arrays aligned. Deviation from these rules of thumb can result in a significant loss of performance. The number of gathers nm is actually determined by the amount of available memory. For a small memory machine like our present CM-2 configuration (8k processors, 64 Mb memory), I could process the entire data set in single blocks of 128 48-fold CMP gathers (2,000 samples per trace), by sequentially loading data blocks and iterating the velocity analysis processing sequence.
For each data block, the stacking velocity semblance panels are computed as follows. Each offset trace of every gather is stretched with an offset-dependent stacking velocity, and stacked into its appropriate semblance panel. This operation is done in parallel across all offsets and gathers, and serially in time.
Then, within each semblance panel, the velocity traces are shifted in a circular manner with the CSHIFT intrinsic, such that the lowest stacking velocity trace v1 moves to the v2 position in the same CMP bin, and the highest velocity trace vn moves to the lowest velocity position v1 in the next CMP bin. This shift is relative to the input traces. Before the shift, seismic trace s2 resides in the same processor as velocity trace v2, and so on. After the shift, seismic trace s2 now shares a processor with velocity trace v1, and so on. Now each offset trace in each CMP gather is stretched with its new velocity, and stacked into the semblance panel.
This process is repeated until all traces are back in their initial configuration, which requires nh circular shifts. Note that input data could be shifted, rather than the semblance panels, but it is quicker to shift the smaller of the two arrays to minimize interprocessor communication. Finally, an array of squared velocity stack energy is maintained to compute semblance, and offset weights. The incorporation of divergence corrections, vertical smoothing and time-slice normalization complicates the simplistic description given above.