Next: A 3-D SYNTHETIC EXAMPLE Up: THE CONNECTION MACHINE ALGORITHM Previous: Algorithm geometry

## The CM Fortran 90 code

The basic outline of our CM migration algorithm follows. First, a large swath of the input trace data is loaded into the array DATA(nt,nx*ny), and the trace coordinate headers are loaded into their respective vectors of the form Xr(nx*ny). To enhance I/O speed, we transpose the trace data on the front-end using the SEPLIB routine Transp. Then we load the trace data in parallel from the front-end to the CM using the block data transfer subroutine cmf_fe_array_to_cm and the temporary front-end vector fe_temp(nx*ny):

```
do 300 it=1,nt
irec= (ishot-1)*nt+it
call cmf_fe_array_to_cm(data(it,:),fe_temp)
300    continue
```

Next, the required migration surface geometry distances are calculated in parallel, given the source and receiver surface coordinates (xs,ys), (xr,yr), and the image volume surface coordinates (xi,yi):

```
c         CALC DISTANCES...
ps2(:)= (xs(:)-xi(:))**2+(ys(:)-yi(:))**2
pr2(:)= (xr(:)-xi(:))**2+(yr(:)-yi(:))**2
```

The heart of the migration operation appears in the next serial loop over ntau:

```
c	MAIN LOOP...

do 2000 itau=1,ntau

c	  Calculate migration imaging time...
tau= 0.5*(itau-1)*dt+t0
tau2= tau*tau
timg(:)= sqrt(ps2(:)/vs2(:)+tau2)+sqrt(pr2(:)/vr2(:)+tau2)

c	  Calculate cos(theta) and amplitude correction weights...
cosr(:)= tau/(timg(:)+1.0e-06)
amp(:)= sqrt(ps2(:)/vs2+tau2)/(sqrt(pr2(:)/vr2+tau2)+1.0e-06)

c	  Calculate image time pointers...
itimg1(:)= int((timg(:)-t0)/dt)+1
where(itimg1(:).ge.nt)  itimg1(:)= nt-1
timg1(:)= (itimg1(:)-1)*dt+t0
itimg2(:)= itimg1(:)+1

c	  Use indirect memory addressing to linearly interpolate trace amps...
call cmf_aref_1d(value1(:),data(:,:),itimg1(:),.true.)
call cmf_aref_1d(value2(:),data(:,:),itimg2(:),.true.)
trcamp(:)= (value1(:)+((value2(:)-value1(:))/dt)*(timg(:)-timg1(:)))

c	  Sum weighted trace value into migrated image...
img(itau,:)= cosr(:)*amp(:)*trcamp(:) + img(itau,:)

2000   continue
```

This main loop is extremely compact due to the high degree of parallel efficiency. We have left the explicit notation data(:,:) to demonstrate that all the calculations are being done in parallel over the nx*ny dimension. First, the vertical traveltimes tau and migration imaging times timg(:) are computed using heterogeneous source and receiver time migration (squared) velocities vs2(:) and vr2(:). Next, the obliquity and divergence compensation amplitudes, cosr(:) and amp(:), are calculated followed by the imaging time pointers itimg1(:) and itimg2(:). Next, the trace values are linearly interpolated in parallel using the fast in-processor addressing subroutine cmf_aref_1d. Finally, all migrated values are summed in parallel for one pseudodepth step tau into the migrated image img(:,:).

At this point, we have migrated each input trace into a single processor (i.e., for a fixed offset per processor). We need to repeat this operation by migrating the traces into other processors (offsets) until each trace has been migrated into each processor. This is efficiently accomplished by the CM cshift subroutine, which shifts each of the traces and its surface coordinate vector to its adjacent processor in parallel:

```
data(:,:)= 	cshift(data(:,:), 2, -1)
xs(:)= 		cshift(xs(:),1,-1)
ys(:)= 		cshift(ys(:),1,-1)
xr(:)= 		cshift(xr(:),1,-1)
yr(:)= 		cshift(yr(:),1,-1)
```

The number of cshifts required is equal to the size of the parallel dimension nx*ny. Depending on the relative sizes of data(:,:) and img(:,:), it may be more efficient to cshift img(:,:) and the coordinate vectors xi(:) and yi(:), rather than data, xs, ys, xr and yr as we have done.

The entire process described above is repeated for the number of generic trace gathers which need to be reloaded into core memory until all front-end trace data have been migrated. Finally, the migrated image is written in parallel from the CM to the front-end using the block data transfer subroutine cmf_fe_array_from_cm as follows:

```
do 3000 itau=1,ntau
call cmf_fe_array_from_cm(fe_temp,data(itau,:))
write(21,rec=itau) (fe_temp(ixy),ixy=1,nxny)
3000   continue
```

For efficiency, we have written the data out in transposed format, to be followed by Transp to map it back to trace sequential format.

Next: A 3-D SYNTHETIC EXAMPLE Up: THE CONNECTION MACHINE ALGORITHM Previous: Algorithm geometry
Stanford Exploration Project
12/18/1997