Accelerating 3D convolution using streaming architectures on FPGAs |
Current Xilinx FPGAs contain three major categories of resources: (1) reconfigurable logic slices with 6-input lookup tables (LUTs) and flip flops (FFs); (2) DSP48E arithmetic units that can perform multiplications; (3) 36-KBit Block RAM (BRAM)s used as local storage or FIFOs.
In our work, we use the Maxeler MAX2 acceleration card, which contains two Virtex-5 LX330T FPGA chips, 12 GB onboard memory, and a PCI-Express x16 interface to the host PC. Table 1 and 2 show the resource summary of our current FPGAs and the recently released Virtex-6 SX475T FPGA, and the basic cost for implementing single-precision floating-point units on FPGAs.
FPGAs | #LUTs | #FFs | #DSP48Es | #BRAMs |
LX330T | 207,360 | 207,360 | 196 | 324 |
SX475T | 287,600 | 595,200 | 2,016 | 1,064 |
Operations | #LUTs | #FFs | #DSP48Es | #BRAMs |
425 | 557 | 0 | 0 | |
122 | 173 | 2 | 0 |
Accelerating 3D convolution using streaming architectures on FPGAs |