next up previous [pdf]

Next: Compressibility Up: Clapp: Compressive sensing Previous: Compressive sensing

Imaging conditions costs in RTM

There are many different imaging condition choices in RTM, that can have dramatically different cost profiles depending on the compute engines' memory structure and size, along with implementation method. All RTM imaging conditions start from a source $ s(x,y,z,t)$ and receiver $ g(x,y,z,t)$ wavefields. Generally some kind of correlations of these two wavefields is done to produce the RTM image. The data handling problem becomes challenging because the source wavefield is propagated forward in time while the receiver wavefield is propagated backward in time. As a result, one of the wavefields is stored to disk, or a checkpointing or random boundary condition is applied.

Until recently, a zero-time, zero-subsurface offset imaging condition was the most commonly implemented method. This amounts producing an image $ I(x,y,z)$ by

$\displaystyle I(x,y,z) = \sum_{\rm shot} \sum_t s(x,y,z,t) g(x,y,z,t) .$ (3)

With this imaging condition, we normally store $ I(x,y,z)$ at a relatively low cost memory hierarchy (it exists in RAM on a CPU, in global memory in the GPGPU, or attached RAM on a FPGA). For simplicity, let's assume that $ g(x,y,z,t)$ is written to disk in a non-compressed form and then read backwards during the source propagation. Given that we often need to propagate six to eight time steps between imaging steps due to stability constraints, to avoid an IO bottleneck we need the cost of reading the receiver wavefield to be $ 1/6$ the cost of our finite difference stencil. As of this writing, we are nearing a cross-over point where this will no longer be valid.

The relative cost of implementing the actual imaging condition is a little trickier. For this exercise, let's assume that we are implementing acoustic RTM on a GPGPU (the relatively simple memory model simplifies the calculation). Let's assume a naive second-order in time, 14th order in space approach. In this case, at each sample we need to read in the previous wavefield, the velocity, and store the updated wavefield. In addition, we will access 43 values in the current wavefield, but because these are stored in shared memory they will have approximately $ \frac{1}{10}$ latency. Therefore, the cost of propagating the wavefield eight times between imaging steps will be $ nx*ny*nz*8*(3+43*.1)$ . The imaging step at each point will require the reading of the source and receiver wavefield from global memory, and the reading and writing of the image, a cost of $ nx*ny*nz*4$ , or a magnitude less than the propagation cost.

Alternate imaging condition choices such as offset-gathers(Sava and Fomel, 2003), time-shift gathers(Sava and Fomel, 2006), and extended image gathers(Sava, 2007), dramatically alter this balance. For conciseness I will limit this discussion to sub-surface offset gathers, but similar limitations apply to all of the choices. To construct sub-surface offset gathers used to form angle gathers, the image, in its most general form, expands by one to three dimensions. In the extreme case, the image becomes $ I(x,y,z,h_x,h_y,h_z)$ , where $ h_x,h_y,$ and $ h_z$ represent lags in $ x,y,$ and $ z$ . The imaging condition then becomes

$\displaystyle I(x,y,z,h_x,h_y,h_z) = \sum_{\rm shot} \sum_t s(x+h_x,y+h_y,z+h_z) g(x-h_x,y-h_y,z-h_z) .$ (4)

In this formulation, the cost of propagating the wavefield and reading the receiver wavefield from disk is the same as the zero-time, zero-offset imaging condition. The IO requirements and computational cost of the imaging steps changes dramatically. The image is now 20 (in the case where we only evaluate $ h_x$ ) to several thousand times larger. As a result, it must now be read and written to disk at every imaging step. If we assume that we are constructing 400 sub-surface offsets at each point, the computational cost increases dramatically. We now benefit from transferring the source and receiver wavefields to shared memory, reducing their cost by possibly up to 80%, but we still end up with a cost of $ nx*ny*nz*800*(2*.2+2)$ , more than a magnitude more expensive than the propagation. However, that is not the real problem. We must read and write each image point at each image step, resulting in an increased IO requirement of 800x. As a result, only approaches that construct several different azimuths of 1-D angle gathers have been generally shown.

Sub-sampling sub-surface offsets offers benefits in both computation and storage. The reduction in offset calculation proportionally reduces the storage on large memory machines, potentially eliminating the need to go to disk. Total computation is reduced by sub-sampling, but given the required random nature of compressive sensing, cache misses will not be proportionally reduced.


next up previous [pdf]

Next: Compressibility Up: Clapp: Compressive sensing Previous: Compressive sensing

2011-05-24