Next: GPU based linearised inversion Up: Leader and Clapp: Linearised Previous: Introduction

GPU based RTM

Reverse Time Migration (Baysal et al., 1983) is increasingly becoming the industry and academic standard for seismic imaging. The full treatment of the wave equation provides us with accurate kinematic and amplitude information. Very few assumptions about the data (maximum dips, single scattering etc.) are necessary, relative to one-way wave equation or Kirchhoff imaging techniques. However, this process is computationally demanding and images still exhibit many artifacts due to the relatively simple nature of the imaging condition (Sun and Zhang, 2009).

We can describe an idealised modelling procedure as in equation 1, which is based on the first approximation of the Born scattering series. RTM is then the adjoint of this process, equation 2. Here $ d$ is the data, $ f$ the source function, $ G_0$ are the respective Green's functions, $ m$ the model, $\mathbf x$ the 3D model coordinates, $\mathbf x_{r,s}$ the 3D source and receiver coordinates and $ *$ denotes the complex conjugate. Despite this mathematical treatment assuming a single scattering, our propagator makes no such assumption and multiple scattering / diving waves are still positioned correctly.

$\displaystyle d(\mathbf x_r,\mathbf x_s,\omega)=\sum\limits_{\mathbf x_s,\omega... ...omega)m(\mathbf {x})\sum\limits_{\mathbf x_r}G_0(\mathbf x, \mathbf x_r,\omega)$

(1)

$\displaystyle m(\mathbf {x})=\sum\limits_{\mathbf x_s,\omega}f(\omega) G_0(\mat... ...athbf x_r}G_0(\mathbf x, \mathbf x_r,\omega)d^*(\mathbf x_r,\mathbf x_s,\omega)$

(2)

Wave propagation can be performed very efficiently on GPUs by taking advantage of shared memory and read redundancy along the y axis (Micikevicius, 2009). The main challenge for GPU based RTM comes from the complex conjugate in equation 2. Since equation 2 is expressed in frequency, this conjugate reverses the sense of time of the data $ d$ relative to the source function $ f$ . As such, the source wavefield (4D, in this case) is forward modelled and saved, and then read back to the GPU at each imaging time step while back propagating the recorded data for correlation. Typically this source wavefield is several hundred gigabytes, meaning disk storage is unavoidable (without compression). In the case where we must transfer the source wavefield from disk our GPU based RTM scheme is far from optimal, as all computational advantages are now being counteracted by compounded disk access.

The reason we cannot simply back propagate the source wavefield is due to the fact that we artificially remove boundary reflections during forward modelling. This violates the conservation of energy, and these source wavefields are consequently not time reversible. A solution for this is to pad our domain with random boundaries (Clapp (2009); Fletcher and Robertsson (2011); Shen and Clapp (2011)); boundary reflections are now incoherently scattered, and since we are not removing any energy it is time reversible, to within computational precision. Our multiplicative imaging condition and subsequent stacking across the shot axes reduce any residual boundary noise to imperctible levels after around 50 or so shots / realisations. For a detailed discussion of how to set up these boundaries and their scattering properties one can refer to Clapp (2009) or Leader and Clapp (2011).

Relative to source saving RTM we need to perform an extra propagation - the reversal and back propagation of the source wavefield, whereas previously we would just read back the appropriate wavefield slice. Fortunately, this extra computation is far faster than the wavefield reading scheme. In the case whereby the entire source wavefield is held by the CPU, and only CPU-GPU transfers are needed, the random boundary method is about 25% faster (Figure 1). For this to be the case the problem has to either be very small or the wavefield must be compressed or decimated, which will take extra computation. Once we have to use the disk we see the random boundary scheme is several times faster.

rtmsp Figure 1. Relative speeds for RTM implementations for propagation, imaging, damping, injecting and IO. The bold numbers are normalised elapsed times. (a) shows the most naive disk set up, (b) an optimised disk set-up with asynchronous transfer, (c) the case where the 4D wavefield can be held by the CPU memory and (d) random boundaries.

Next: GPU based linearised inversion Up: Leader and Clapp: Linearised Previous: Introduction

2012-05-10