Accelerating 3D convolution using streaming architectures on FPGAs |
Our exploration on FPGA convolution designs shows that, the `cube' stencil fits the FPGA streaming architecture much better than the `star' stencil. We especially investigate the architecture that processes multiple time steps in one pass. This approach removes the constraints of the memory bandwidth, and improves the performance at the cost of extra data buffering and streaming overhead. Experiment results show that the FPGA streaming architecture provides great potential for accelerating 3D convolution, and can achieve up to two orders of magnitude speedup.