Next: FINITE-DIFFERENCE ALGORITHMS Up: Biondi: Wave-equation algorithms on Previous: PARALLEL PROGRAMMING IN FORTRAN

FOURIER DOMAIN ALGORITHMS

The Fourier domain algorithms are naturally parallel because the computations for the different frequencies (or wavenumbers) are usually independent. However, the execution of Fourier domain algorithms requires the transformation of the data by use of FFT's. Implementing efficiently FFT's on a parallel computer is difficult because FFTs are ``global'' algorithms, and thus they require a lot of data motions. There are many specialized algorithms for optimizing FFTs on different parallel architecture (Johnsson et al., 1989). Fortunately, many parallel computers has scientific library, of which FFTs are one of the main components (TMC, 1991). Geophysicists can thus avoid the troubles of coding efficient FFTs by themselves.

As an example of Fourier domain algorithms I discuss a prestack phase-shift migration (Gazdag, 1978). In a phase-shift migration the prestack data are first transformed into the frequency-wavenumber domain by a multidimensional FFT. Then, the transformed data are downward continued by multiplication with a complex exponential. Finally, the downward continued data are imaged at zero time and zero offset. The imaging at zero time and zero offset can be simply computed by summing the downward continued data for all frequencies, and all offset wavenumbers, without computing the inverse Fourier transform for all times and offsets.

The downward continuation is naturally parallel because the different wavenumber components of the data can be downward continued independently; they interact only when they are summed together for imaging the result at zero offset. This summation is easily coded in Fortran 90 by the intrinsic function SUM; the compiler takes care of issuing the correct instructions for the necessary interprocessor communications. Global communications, as the sum over one or more axes of an array, are usually very fast because they can use the whole bandwidth of the communication network linking the processors. Consequently, the time spent in communications is only a small percentage of the total run time (about 5-8 % in this particular case).

Not all the wavenumber components need to be downward continued, because some of them correspond to evanescent waves. The zeroing of the evanescent waves cannot be expressed with the usual serial IF statement, because all the wavenumbers are processed in parallel. Therefore the parallel conditional statement WHERE must be used for distinguishing between propagating and evanescent waves. A SIMD (Single Instruction Multiple Data) computer executes the two branches of a WHERE statement in sequence, one after the other, not in parallel. On the contrary, a MIMD computer (Multiple Instructions Multiple Data) could execute them in parallel. The sequential execution of the two branches of a WHERE statements is a potential disadvantage of SIMD computers, but it is not in this particular case, because the branch corresponding to evanescent waves simply sets to zero the data, and thus takes few CPU's cycles.

The kernel of phase-shift prestack migration is expressed in Fortran 90 by the following code fragment:

```C
C loop over migrated time
C
DO ITAU = 1,NTAU-1
C
C compute Vel*Vel*Ks*Ks and Vel*Vel*Kg*Kg
C
VELSQ = VELTAU(ITAU)*VELTAU(ITAU)
KSVELSQ = VELSQ*KS2SQ
KGVELSQ = VELSQ*KG2SQ

TAUSLICE = 0.

C
C loop over frequencies
C
DO IW = 1,NW
CMOMEGA = OMEGA(IW)

C
C downward propagate the non-evanescent waves
C
WHERE((CMOMEGA .GT. KSVELSQ).AND.(CMOMEGA .GT. KGVELSQ))

KTAU = -DTAUD2*(SQRT(CMOMEGA-KSVELSQ)+SQRT(CMOMEGA-KGVELSQ))
DATA(IW,:,:) = CMPLX(COS(KTAU),SIN(KTAU))*DATA(IW,:,:)

C
C zero the evanescent waves
C
ELSEWHERE
DATA(IW,:,:) = 0.
END WHERE

C
C image at zero time
C
TAUSLICE = TAUSLICE+DATA(IW,:,:)

END DO

C
C image at zero offset
C
MIGRES(ITAU+1,:) = SUM(TAUSLICE,DIM=1)

END DO
```

Notice that in a phase-shift algorithm the different frequencies components could have also been processed in parallel, but, by serially looping along the frequency axis, I avoid the redundant evaluation, and storage, of some working arrays (KSVELSQ and KGVELSQ in my code).

Prestack Stolt migration (Clayton and Stolt, 1981) can be implemented similarly to phase shift; also for Stolt migration the wavenumber components can be processed in parallel. However, in an efficient implementation of Stolt migration, the frequency axis must be local to each processor for avoiding interprocessor communications during the Stolt transformation of the frequency axis. If the frequency axis is kept local, the Stolt transformation can be simply implemented with the usual interpolation algorithms used on serial computers.

Next: FINITE-DIFFERENCE ALGORITHMS Up: Biondi: Wave-equation algorithms on Previous: PARALLEL PROGRAMMING IN FORTRAN
Stanford Exploration Project
12/18/1997