 |
 |
 |
 | Many-core and PSPI: Mixing fine-grain and coarse-grain parallelism |  |
![[pdf]](icons/pdf.png) |
Next: Niagara2 Overview
Up: Liaw and Clapp: Niagara2
Previous: Liaw and Clapp: Niagara2
Seismic imaging problems lend themselves well to coarse-grain parallelism.
Kirchoff migration can be parallelized by splitting the image space (and/or data
space) over many
processing units. Downward continuation based migration can be parallelized over
frequency. Flavors of downward continuation and reverse
time migration can be further parallelized over
shot or plane wave. All of these parallelism methods can be described
as `coarse-gained'. Coarse-grained parallelism fits well the cluster computing
of the last decade. Several exciting new architectures including Nvidia's Grahic's
Precision Unit (GPU),
IBM's cell, Field Programable Gate Arrays (FPGA), and Sun's Niagara platform are more
aimed at a fine-grained parallelism model. These platforms can have
threads in the 10s-100s often making coarse-grain parallelism impractical because
of memory constraints.
Early results (Pell et al., 2008) on these
architecture's are promising but implementation can be challenging.
Downward-continuation based migration (Claerbout, 1995) is a more challenging
imaging algorithm to implement on a fine-grained parallel machine.
The challenge in the implementation comes from the 2-D (shot-profile,
plane-wave) 3-D (common-azimuth), or 4-D (narrow-azimuth, full-azimuth)
FFT. The implicit-transpose and the non-uniform data access pattern
does not easily port to FPGA and GPU solutions. The multi-thread per core
approach of the Sun Niagara2 offers an easier parallelism route.
In this paper we demonstrate that the optimal solution for PSPI migration
on the Niagara2 is by mixing the
coarse-grained and fine-grained parallelism models.
We begin by presenting an overview of the Niagara2 architecture
and the PSPI algorithm. We show how some portions of the PSPI algorithm benefit
from Niagara's multiple threads per core while others show only minimal improvement.
We conclude by discussing the bottlenecks to further efficiency improvements.
 |
 |
 |
 | Many-core and PSPI: Mixing fine-grain and coarse-grain parallelism |  |
![[pdf]](icons/pdf.png) |
Next: Niagara2 Overview
Up: Liaw and Clapp: Niagara2
Previous: Liaw and Clapp: Niagara2
2009-04-13