Next: Appendix Up: Moussa: GPGPU RTM Parallelism Previous: Performance and Benchmark Summary

Conclusion

The dramatic speedup of the computational kernel provides strong motivation for continued work in GPGPU parallelism. Benchmark results suggest that the most important area to tackle is Host-Device (PCI-e) bus bandwidth, which accounts for 90% of the total system utilization time.

At present, my implementation does not have any tasks for the high-performing Xeon processors on the host. These CPUs are suitable for performing a lot of useful work, such as data post-processing or visualization. An alternative architecture could tightly couple CPU and GPU processes to maximize system utilization.

Another suggested research area is the implementation of compression during transfer. Velocity models, which contain large quantities of redundant data, could easily be compressed; seismic records will probably not compress well with a lossless algorithm such as GZIP (LZ77) because they do not contain the same amount of redundancy as velocity models. In future work, I will quantify these compression ratios for real data sets, which will help validate the utility of compressed Host-Device communication.

The system-level transition towards exotic computing platforms is always an engineering tradeoff. The performance benefits of such an environment must be sufficiently high to offset the development and maintenance cost with the new system. The GPGPU platform and its CUDA programming environment is sufficiently familiar to a geophysical programmer and exposes massive parallel capability in a straightforward way. The immediate performance boost is evident from the preliminary benchmarks presented in this report. Significant further optimizations can be realized in future work via more analysis and refinement of this GPGPU approach.

Next: Appendix Up: Moussa: GPGPU RTM Parallelism Previous: Performance and Benchmark Summary

2009-05-05