Writing a code that uses the same data
access pattern for all three architectures would lead to a far-from-optimal code.
As a result, an unbiased comparison is difficult. Any one
of the three can be declared the best or worst depending on what constraints you choose to put
on your test. If you have a very large domain or a large stencil, the GPGPU isn't competitive.
If you don't have a large domain requirement
and want the simplest to understand and maintain optimized code, the GPGPU is probably
the winner. The CPU has the advantage of being omni-present, and all but cache-specific
code portions are virtually guaranteed to perform in the future, making it an
attractive alternative. On the other hand, the raw compute advantage of GPGPUs and FPGAs
looks to be growing over the next few years. FPGAs have the raw compute power
to rival the GPGPUs and don't suffer from the size limitations
of GPGPUs, but use a significantly different programming model whose abstraction level
is still in flux. An abstraction layer that leaves the data movement and storage mechanism
to a precompiler while limiting itself describing the mathematical update procedure,
similar to the FPGA example shown above, holds some potential.
Selecting the right hardware for Reverse Time Migration