Next: Future Work Up: Pell and Clapp: Accelerating Previous: Subsurface offset imaging condition

Results

To test the applicability of this approach we compared the result of constructing angle gathers for the 2-D Marmousi synthetic dataset. Figure

show the zero-subsurface offset image obtained from implementing the imaging step of shot profile migration on both the processor and the FPGA. The images are indistinguisable.

mig
Figure 3 Comparison of the zero-subsurface offset image from implementing the shot profile imaging condition on the processor (top) and the FPGA (bottom).

The left panel of Figure shows an angle gather constructed from the CPU implementation of the imagining condition. The remaining panels show the same angle gather obtained from the FPGA implemented imaging condition with decreasing floating point precision. Note that visually the kinematics are identical.

gath
Figure 4 The same angle gather obtained by various implementation of the shot profile imaging condition. The top-left panel shows the result from a CPU based implementation. The remaining panels show various FPGA implementations. Note that they are visually kinematically identical.

To test the speed-up offered by the FPGA implementation we ran a larger 3-D problem. Specifically the cost of constructing 41 subsurface offset gathers from 500 inline CMPS, 400 crossline cmnps, 200 frequencies, and 41 subsurface offsets. We compare our FPGA implementation to a 2.8Ghz AMD Opteron-based PC with 12GB of RAM. The software implementation was written in C and compiled using both gcc and the Intel C Compiler with full optimization, the average of three runs was selected. The FPGA accelerator was implemented on a Maxeler MAX-1 FPGA platform equipped with a Xilinx Virtex-4 FX100 FPGA. The accelerator circuit consumes 58% of the logic resources of the device and runs at 125Mhz. Table 1 shows the runtimes for the gather operation at a single depth and shot, carried out both in software and on the FPGA. The FPGA computes the gather 19-21 times faster than the software using 32-bit data, or 35-42 times faster than the software using 16-bit data. This degree of acceleration transforms the application space, instead of the subsurface offset gather being dominant the time spent computing it is now insignificant as a portion of the overall runtime.

**Table 1:** Performance comparison of FPGA and 2.8GHz AMD Opteron. T_sw is the time in seconds for the software version. T_fpga32 is the time for the FPGA processing 32-bit data, T_fpga16 is the time for the FPGA processing 16-bit data. Speed-up is shown for both data sizes.
ny	T_sw	T_fpga32	Speed-up	T_fpga16	Speed-up
1	0.041	0.002	21x	0.001	41x
50	1.48	0.073	20x	0.042	35x
100	2.76	0.149	19x	0.075	37x
200	6.40	0.311	21x	0.150	42x

Next: Future Work Up: Pell and Clapp: Accelerating Previous: Subsurface offset imaging condition

Stanford Exploration Project
5/6/2007