![]() |
![]() |
![]() |
![]() | Accelerating 3D convolution using streaming architectures on FPGAs | ![]() |
![]() |
The oil industry has always been one of the leading consumers of high performance computing systems. With the increasing of the CPU clock frequencies coming to an end, we can no longer double our computation speed by purchasing updated computers every eighteen months and need to adapt to new computation architectures, such as multi-core processors, General Purpose Graphic Processing Units (GPGPUs), and Field Programmable Gate Arrays (FPGAs).
Recent research work has shown that FPGAs can provide a customized solution for a specific application and achieve more than two orders of magnitude speedup compared to a single-core software implementation. Examples include cryptology applications (Cheung et al. 2005), finance and physics simulations (Zhang et al. 2005; Gokhale et al. 2004) as well as seismic computations (Nemeth et al. 2008).
The major difference between FPGA and other computation platform is the reconfigurability of the processing and storage units in the device, which enables an FPGA to be configured into arbitrary processing units and circuit structures. The reconfigurability of the FPGA leads to two major advantages over other computation platforms:
(1) A streaming computation architecture. While CPUs and GPGPUs take in a sequence of instructions that operate on corresponding data in memory, in FPGAs the instructions are mapped into circuit units along the path from input to output. The FPGA then performs the computation by streaming the data items through the circuit units. The streaming architecture makes efficient utilization of the computation device, as every part of the circuit is performing an operation on one corresponding data item in the data stream.
(2) Customizable number representations. While CPUs and GPGPUs can only handle 8-, 16-, 32- or 64-bit variables, FPGAs support arbitrary bit width for each variable in the design. By adjusting the bit widths according to the precision requirement, we can often achieve significant reduction in the silicon area cost of arithmetic units and the bandwidth requirement between different hardware modules, thus improving the overall throughput of the entire system.
To investigate FPGA's capability on solving the convolution problem, we explore design options such as: (1) using different stencils; (2) fitting multiple stencil operators into the FPGA; (3) processing multiple time steps in one pass; (4) customizing the computation precisions. The exploration demonstrates constraints and tradeoffs between different design parameters and metrics. Experiment results show that the streaming computation architecture of FPGAs can provide up to two orders of magnitude speedup compared to a single-core software implementation.
![]() |
![]() |
![]() |
![]() | Accelerating 3D convolution using streaming architectures on FPGAs | ![]() |
![]() |