Next: Basic building blocks Up: Table of Contents

Inversion and fault tolerant parallelization using Python

Robert G. Clapp

bob@sep.stanford.eda

ABSTRACT

Many current areas of research at SEP involve large-scale inversion problems that must be parallelized in order to be tractable. Writing fault-tolerant, parallel code requires significant programming expertise and overhead. In this paper, a library, written in Python, is described that effectively simulates a fault-tolerant parallel code, using simple serial programs. In addition, the library provides the ability to use these parallel objects in out-of-core inversion problems in a fault-tolerant manner.

INTRODUCTION The large size of today's oil industry problems necessitates harnessing the power of clusters. The problem is that as we add nodes, we increase our odds of node failure. Inversion on large-scale problems is even more problematic. Operators can take days to weeks to run (, ) and can involve multiple instances of complex operations (). Running these problems on Beowulf clusters poses a problem as the odds of a multi-week job running without a node failing are low.

In (), I described a library, written in Python, that allows auto-parallelization with a high-level of fault tolerance for almost any SEPlib program. Instead of handling parallelization within a compiled code at the library level, the parallelization is done at the script level which sits on top of the executables. The Python library distributes and collects the datasets, keeps track of what portions of the parallel job are done, and monitors the state of the nodes. The distribution and collection are done through MPI but individual jobs are all serial codes. The code is written using Python's object-oriented capabilities so it is easily expandable. A parallel job is described by a series of files and a series of tasks.

For inversion problems, () describes a Python inversion library which uses abstract vector and operator descriptions. From these abstract classes I derive specific classes to handle out-of-core problems. Operators become wrappers around SEPlib programs and vectors wrappers around SEPlib files.

In this paper I introduce an improved version of the library described in () and (). The new version provides significant additional flexibility. Multiple programs can be combined into single executables. Parallel files can now be SEP3D files, and/or involve overlapping patches. Inversion can be done on parallel files (instead of collected on some master node), saving disk space and transfer time.

In the first portion of the paper I will cover the basic Python parallel and inversion objects. In the second portion I will show several examples on how to use these objects to accomplish tasks that are both memory and computationally intensive.

Next: Basic building blocks Up: Table of Contents

Stanford Exploration Project
5/3/2005