next up previous [pdf]

Next: Memory limitations Up: GPGPU Previous: GPGPU

Programming model limitations

To achieve maximum GPGPU performance, we need to maximize the number of threads per thread block and have at least as many grid blocks as the number of processing units. On the high end GPGPU, this means we would like to have at least 100,000 identical tasks. Unfortunately, this eliminates some of the optimization opportunities available on the CPU. Oblivious cache is impractical and we are limited in our ability to compress model parameters. Both of these result in significantly more strains on global memory. A potentially greater problem is introduced with the boundary condition. The SIMD parallelism required for performance on the GPU makes complex boundary conditions like PML and probably the zero slope boundary condition impractical. As a result, we are limited to using damping schemes that require us to expand our computational domain to achieve damping results that are sub-optimal compared to PML schemes.


next up previous [pdf]

Next: Memory limitations Up: GPGPU Previous: GPGPU

2009-10-16