next up previous [pdf]

Next: Bit-accurate Value Simulator Up: Customized Number Representations Previous: Profiling


Circuit Design: Basic Arithmetic &
Elementary Function Evaluation

After profiling range information for the variables in the target code, the second step is to map the code into a circuit design described in ASC.

As a high-level FPGA programming language, ASC provides hardware data-types, such as HWint, HWfix and HWfloat. Users can specify the bit-width values for hardware variables, and ASC automatically generates corresponding arithmetic units for the specified bit-widths. It also provides configurable options to specify different optimization modes, such as AREA, LATENCY and THROUGHPUT. In the THROUGHPUT optimization mode, ASC automatically generates a fully-pipelined circuit. These features make ASC an ideal hardware compilation tool to re-target a piece of software code onto the FPGA hardware platform.

ASC does not have inherent support for LNS numbers. This part is covered by our previous work on the LNS arithmetic library generator (Fu et al., 2007), which produces optimized LNS arithmetic units with customizable bit-width values, also in ASC format.

Thus, with support for fixed-point, floating-point and LNS arithmetic operations, the target Fortran code can be transformed into ASC C++ code in a straightforward manner. We also have interfaces provided by ASC and the LNS library generator to modify the internal settings of these arithmetic units.

In seismic applications, evaluation of elementary functions takes a large part in the application. For instance, in the first piece of target code we try to accelerate, the `complex exponential function'. A large part of the computation is to evaluate the square root and sine/cosine functions. To map these functions into efficient units on the FPGA board, we use a table-based uniform polynomial approximation approach, based on Dong-U Lee's work on optimizing hardware function evaluation (Lee et al., 2005). The evaluation of the two functions can be divided into three different phases (Muller, 1997):

To keep the whole unit small and efficient, we use degree-one polynomial so that only one multiplication and one addition are needed to produce the evaluation result. Meanwhile, to preserve the approximation error at a small scale, the reduced evaluation range is divided into uniform segments. Each segment is approximated with a degree-one polynomial, using the minimax algorithm. In the case of the `complex exponential' code segment, the square root function is approximated with 384 segments in the range of [0.25, 1] with a maximum approximation error of $4.74\times10^{-7}$, while the sine and cosine functions are approximated with 512 segments in the range of [0, 2] with a maximum approximation error of $9.54\times10^{-7}$.


next up previous [pdf]

Next: Bit-accurate Value Simulator Up: Customized Number Representations Previous: Profiling

2007-09-18