To image geological structures in the subsurface
the exploration seismic industry processes huge amount
of data with computationally intensive algorithms.
In a typical
marine acquisition (see image on the right)
data density is in the order of 10-100 GigaBytes per square kilometer
and surveys cover thousands of square kilometers.
To image each of these surveys the industry may employ
a large cluster farm made of several thousands of nodes
for more than a month.
The seismic industry is probably the largest civilian user
of high-performance computing in the world.
In the past three decades
it has been an early adopter of all the successive supercomputing technologies:
starting from the first Cray vector computers, to the use of massively parallel
computers like the Connection Machines.
Now geophysical companies and oil companies rely on
large installations of Linux Clusters to image seismic data
within a useful turn-around time.
Predicting the future of computer technology is a dangerous exercise,
but whatever the future direction of scientific computing might be,
Grid computing
and/or use of the
graphic-rendering hardware for numerical computation
,
it is likely that seismic imaging will be a massive user
of those technologies.
At the Stanford Exploration Project (SEP) we validate our algorithms
on small subsets of the actual surveys.
However, many of our imaging methods require testing
on at least 10-50 square kilometers of data to be meaningful;
that is, we must be capable of handling data sets
of few hundreds of GigaBytes.
Today, our main computational platforms are several Linux clusters,
with an aggregate total number of CPUs of about 250.
Because, of the size of the data sets, and the inherent
requirements of data locality of our algorithms,
our computer need at least 2-8 Gigabytes of local memory to
be readily accessible by each CPU.
We also benefit from large local disk on the nodes;
our clusters have now up to 250 GigaBytes of disk
on each node.
Details on SEP supercomputing activities can be found
at the
Supercomputing at SEP
page.
To be able to build and maintain this computational facility,
in 2003 we launched the
Linux Cluster Initiative ,
a three-years addition to our regular affiliate program.
The image on the left shows one of our clusters that was installed at the
beginning of 2005 and the people that run it.
One of SEP's main goals is to develop the next generation
of imaging algorithms. These algorithms may be too computationally
intensive to be applied on industrial scale today,
but they will become cost-effective as
the computer industry drives computational costs down.
An important class of advanced algorithms that we are actively
investigating requires the iterative inversion of wave-equation operators.
The performance of each iteration of these inversion processes
on a a small data set (~10 square kilometers)
requires several days to run on SEP's largest cluster (80 CPUs),
and satisfactory results are achieved only after many iterations.
To investigate the limitations of present methods and to devise
new ones we must run such large tasks many times.
We are thus always on the lookout for the latest high-performance computing innovations
because the future of SEP depends on them.