Next: CUDA Programming Methodology Up: Moussa: GPGPU RTM Parallelism Previous: Moussa: GPGPU RTM Parallelism

Introduction

Reverse time migration (RTM) is often used for seismic imaging, as it has preferable numerical and physical properties compared to competing algorithms, and thus generates better images (Zhang and Sun, 2009). These benefits come at a high computational cost, so research effort is required to make RTM a more economically competitive method for seismic imaging. This is the motivation for GPGPU parallelism of RTM.

The processing flow for imaging a seismic survey can be parallelized in many tiers. This multi-tiered parallelism has been noted in earlier computer architecture research for seismic imaging (Bording, 1996). This hierarchical parallelism is particularly prominent in RTM, and it provides opportunities for significant performance increase throughout the algorithm. A modern GPGPU platform, such as the Nvidia S1070, is uniquely capable of mirroring this tiered algorithm structure, because its architecture is similarly structured with both coarse-grain and fine-grain parallel capabilities.

At the highest level of abstraction, a data set can be divided into spatially separate regions of independent data (shot profiles). This is a Single Program, Multiple Data (SPMD) approach, and due to low data dependency, interprocess communication is generally not needed. This can directly map to a hardware multi-GPU implementation.

At each data subset, the migration can be further parallelized at a finer granularity. There are three potential stages for parallelism in the RTM algorithm, but there is a severe data dependency limitation. The imaging condition requires the computed results of both the forward wavefield, , and the reverse wavefield, , for each time step. Unfortunately, because the two wavefields are computed in opposite time directions, performing the imaging condition usually requires computing the complete wavefield , writing it to disk, and reading its precomputed values for image condition correlation as soon as that time step is available from the reverse time wavefield. This data dependency is a major obstacle to parallelism at this stage, and constrains performance.

At the finest level of parallelism, the individual wavefield propagation steps can reduce the compational load by taking advantage of vectorization, floating-point math optimizations, and numerical reorganization. The imaging condition can also benefit from parallelization, because it is essentially a large 2D or 3D correlation. This is easily vectorizable, and is especially suitable for a GPU, which was originally designed as a large vector-computer.

Clearly, the GPGPU platform provides multi-tiered parallelism capability that matches the RTM structural design. The encouraging preliminary results seem to confirm that the GPGPU platform is well suited to RTM optimization, and suggest that further optimization can continue to yield dramatic execution time improvements. This will allow more advanced processing with correspondingly better subsurface image results.

Next: CUDA Programming Methodology Up: Moussa: GPGPU RTM Parallelism Previous: Moussa: GPGPU RTM Parallelism

2009-05-05