Next: Implementation Up: Moussa: GPGPU RTM Parallelism Previous: Hardware Platform

Evaluation Metrics

There are many ways to compare and evaluate parallelization schemes for RTM. Because the GPGPU approach is so novel, it is difficult to perform direct comparison with other parallelization schemes for Reverse Time Migration. Other hardware platforms do not provide the same software abstractions. Many of the GPGPU metrics thus have no direct comparable equivalent on alternative systems. Of course, key performance metrics are directly comparable to serial or parallel CPU RTM implementations. These include:

Total execution time
Cost ($) per FLOPS
FLOPS per Watt

Other internal performance metrics of my implementation can be compared to academic and industrial research progress in high-performance GPGPU wave propagation. Wave propagation has been previously implemented in Finite Difference Time Domain (FDTD) for nearly identical hardware (Micikeviciuis, 2008); the forward- and reverse-wavefield computation performance can be directly compared to such an implementation. FDTD performance measurements include:

Maximum computational grid size
Block subdivision size
Wavefield grid points per second
Numerical order of spatial derivatives

One goal of SEP’s investigations into various parallelization technologies is to subjectively evaluate the feasibility for future performance, ease of development, and maintainability of code. Technologies like the CUDA/GPGPU approach are compared subjectively to other systems, such as the SiCortex SC072 ``Desktop Cluster'' as well as conventional multicore and multi-node CPU parallelization. The following metrics can be roughly estimated for each technology, noting that there is some ambiguity in direct comparisons across widely varying exotic architectures:

Cost ($) per FLOPS
FLOPS per Watt
FLOP operations needed for complete migration
Execution time for complete migration
Accuracy of the wave propagation operator

As I am not trained as an interpretational geologist, subjective assessment of the image quality is difficult for me. Nonetheless, it has been widely established in industrial contexts that the correct implementation of RTM yields better images for decision-making and analysis. Certain computational architectures can enhance this effect by enabling higher-accuracy RTM, (e.g. using higher-order wavefield operators). By providing very cheap floating-point math, the GPGPU approach enables more operations per data point, allowing more accurate wave modeling with minimal execution time overhead. The overall speedup that a GPGPU implementation can provide can allow additional iterations as part of larger inversion problems, increasing the accuracy of these processes. The result is a subjectively better migrated image.

Finally, it is worth noting the benefits of GPGPU parallelization from a software engineering and code-maintenance standpoint. CUDA is designed to be simple, consisting of a set of extensions to standard C programming. The programming environment is easy to learn for most programmers. The code is systematically separated into host setup code and device parallelization code; and CUDA can interoperate with C or C++, allowing functional- or object-oriented system design, as the situation requires.

Next: Implementation Up: Moussa: GPGPU RTM Parallelism Previous: Hardware Platform

2009-05-05