High-Performance Computing & Seismic Imaging

Marine Seismic acquisition
Marine data acquisition
To image geological structures in the subsurface the exploration seismic industry processes huge amount of data with computationally intensive algorithms. In a typical marine acquisition (see image on the right) data density is in the order of 10-100 GigaBytes per square kilometer and surveys cover thousands of square kilometers. To image each of these surveys the industry may employ a large cluster farm made of several thousands of nodes for more than a month.
The seismic industry is probably the largest civilian user of high-performance computing in the world. In the past three decades it has been an early adopter of all the successive supercomputing technologies: starting from the first Cray vector computers, to the use of massively parallel computers like the Connection Machines. Now geophysical companies and oil companies rely on large installations of Linux Clusters to image seismic data within a useful turn-around time. Predicting the future of computer technology is a dangerous exercise, but whatever the future direction of scientific computing might be, Grid computing and/or use of the graphic-rendering hardware for numerical computation , it is likely that seismic imaging will be a massive user of those technologies.

Picture of SEP cluster (2005)
Celebrating the arrival on new SEP's cluster (2005)
At the Stanford Exploration Project (SEP) we validate our algorithms on small subsets of the actual surveys. However, many of our imaging methods require testing on at least 10-50 square kilometers of data to be meaningful; that is, we must be capable of handling data sets of few hundreds of GigaBytes. Today, our main computational platforms are several Linux clusters, with an aggregate total number of CPUs of about 250. Because, of the size of the data sets, and the inherent requirements of data locality of our algorithms, our computer need at least 2-8 Gigabytes of local memory to be readily accessible by each CPU. We also benefit from large local disk on the nodes; our clusters have now up to 250 GigaBytes of disk on each node. Details on SEP supercomputing activities can be found at the Supercomputing at SEP page. To be able to build and maintain this computational facility, in 2003 we launched the Linux Cluster Initiative , a three-years addition to our regular affiliate program. The image on the left shows one of our clusters that was installed at the beginning of 2005 and the people that run it.

One of SEP's main goals is to develop the next generation of imaging algorithms. These algorithms may be too computationally intensive to be applied on industrial scale today, but they will become cost-effective as the computer industry drives computational costs down. An important class of advanced algorithms that we are actively investigating requires the iterative inversion of wave-equation operators. The performance of each iteration of these inversion processes on a a small data set (~10 square kilometers) requires several days to run on SEP's largest cluster (80 CPUs), and satisfactory results are achieved only after many iterations. To investigate the limitations of present methods and to devise new ones we must run such large tasks many times. We are thus always on the lookout for the latest high-performance computing innovations because the future of SEP depends on them.