All of the speedups in this paper include the transfer time to and from
the processor. If multiple portions of the algorithm are performed on
the FPGA without returning to the CPU the additional speedup can be
considerable. In the cases shown in this paper the limiting factor is
the transfer time. For example if the FFT and FK step can reside simultaneously
on the FPGA the cost of the FK step disappears. In the case of acoustic modeling multiple
time steps could be applied simultaneously.