The variations in approaches for implementing an optimized
version of RTM on CPU, GPGPU, and
FPGA archictectures make direct comparison infeasible. Finding the platform
that provides the optimal RTM performance involves balancing price,
compute and memory potential, programmability, and
the set of algorithmic approaches that are
feasible on a
given platform. Understanding the variation in underlying
architecture and the resulting optimal programming approach can
lead to new insights on how to produce optimal code.