CPU Implementation

Next: GPU Implementation Up: 3D Wave Propagation Previous: 3D Wave Propagation

CPU Implementation

Solving Equation 1 numerically using a finite-difference time domain approach can be simply set up as a serial convolution repeated across the entire domain. Of course this is the most naive and least efficient way to do this. Over a very modest model size of 1 million points, for 8000 time steps this takes around 93 minutes to propagate. This serial methodology can be improved by blocking the domain into smaller 'pencil' shaped blocks about the fast axis; this will give a better cache hit rate, reduce calls to L3 memory and hide some latency. This can roughly half the computation time.

By parallelising over multiple cores a non-blocked speed up of around 3.5x is seen (over 8 cores, after this speed up saturates). Using the optimum blocking and parallelism discovered this computation time is reduced to around 16 minutes, giving a 5.5x - 6x speed up. Nonetheless, 3D model sizes are typically two or three orders of magnitude larger than this, meaning for a realistic model size this parallel, blocked CPU methodology will take several hours per shot. It should be stressed that this is the best speed up observed using a Fortran90 with OpenMP approach, by further vectorising loops and coding at assembly levels this can be further accelerated many times.

Next: GPU Implementation Up: 3D Wave Propagation Previous: 3D Wave Propagation

2011-05-24