Accelerating seismic computations using customized number representations on FPGAs 
The mapping from the software code to a hardware circuit design is straightforward for most parts. Fig. 4 shows the general structure of the circuit design. Compared with the software Fortran code shown above, one big difference is the handling of the sine and cosine functions. In the software code, the trigonometric functions are calculated outside of the fivelevel loop, and stored as a lookup table. In the hardware design, to take advantage of the parallel calculation capability provided by the numerous logic units on the FPGA, the calculation of the sine/cosine functions are merged into the processing core of the inner loop. Three function evaluation units are included in this design, to produce values for the square root, cosine and sine functions separately. As mentioned in earlier, all three functions are evaluated using degreeone polynomial approximation with 386 to 512 uniform segments.
weiwemcircuit
Figure 4. General structure of the circuit design for the `wei_wem' function.  

The other task in the hardware circuit design is to map the calculation into arithmetic operations of certain number representations. The previous table shows the value range of some typical variables in the `wei_wem' function. Some of the variables (in the part of square root and sine/cosine function evaluations) have a small range within [0, 1], while other values (especially `wfld' data) have a wider range from to . If we use floatingpoint or LNS number representations, their wide representation ranges are enough to handle these variables. However, if we use fixedpoint number representations in the design, special handling is needed to achieve acceptable accuracy over wide ranges.

The first issue to consider in fixedpoint designs is the division after the evaluation of the square root, . Suppose the error in the square root result is , and the error in variable is , assuming the division unit itself does not bring extra error, the error in the division result is given by . As holds a dynamic range from to , and has a maximum value of (variables step%x and step%y have similar ranges), in the worst case, the error from can be magnified by 70 times, and the error from magnified by approximately 9000 times. The values of , and come from the software program as input values to the hardware circuit.
To solve this problem, we perform shifts at the input side to keep the three values , and in a similar range. For and the larger value between and , we perform the shifts so that the leading one of them is just right to the fractional point (in the form of ); for the smaller value between and , we assure it is shifted by the same distance as the larger value. The shifting distance difference between the and is recorded, so that after the division, the result can be shifted back into the correct scale. In this way, the has a range of and has a range of . Thus the division only magnifies the errors by an order of 3 to 6. Meanwhile, as the three variables , and are originally in single precision floatingpoint representation in software, when we pass their values after shifts, a large part of the information stored in the mantissa part can be preserved. Thus, a better accuracy is achieved through the shifting mechanism for fixedpoint designs.
itableerror
Figure 5. Maximum and average errors for the calculation of the table index when using and not using the shifting mechanism in fixedpoint designs, with different uniform bitwidth values from 10 to 20.  

Fig. 5 shows experimental results about the accuracy of the table index calculation when using shifting or not using shifting, with different uniform bitwidths. The possible range of the table index result is from 1 to 2001. As it is the index for tables with smooth sequential values, an error within five indices is generally acceptable. We assume that the table index results calculated with double precision floatingpoint representation are accurate enough and use them as the true values for error processing. When the uniform bitwidth of the design changes from 10 to 20, designs using the shifting mechanism show a stable maximum error of 3 and an average error around 0.11. On the other hand, the maximum error of designs without shifting vary from 2000 to 75, and the average errors vary from approximately 148 to 0.5. These results show that the shifting mechanism provides much better accuracy for the part of the table index calculation in fixedpoint designs.
The other issue to consider is the representation of `wfld' data variables. As shown in the table above, both the real and imaginary parts of `wfld' data have a wide range from to . Generally, fixedpoint numbers are not suitable for representing such wide ranges. However, in this seismic application, the `wfld' data is used to store the processed image information. It is more important to preserve the pattern information shown in the data values rather the data values themselves. Thus, by omitting the small values, and using the limited bitwidth to store the information contained in large values, fixedpoint representations still have a better chance to achieve accurate image in the final step. In our design, for convenience of bitwidth exploration, we scale down all the `wfld' data values by a ratio of so that they fall into the range of [0, 1).
Accelerating seismic computations using customized number representations on FPGAs 