If we do a full inverse then the obvious domain to parallelize the inversion is over frequency. In this case the model and data time axis is log-stretched and transformed into the frequency domain. The resulting model and data space are approximately three times the size of their time domain representation due to the oversampling necessitated by the log-stretch operation. In addition, both these volumes need to be transposed. To apply the log-stretch FFT operation, the natural ordering is for the time/frequency axis to be the inner axis while the inversion is more efficient with the time/frequency axis being the outer axis. An out-of-core transpose grows in cost with the square of the number of elements. For efficiency, I do a pre- and post-step parallel transpose of the data in conjunction with the transformation to and from the log-stretched frequency domain. I split the data long the axis. Before the pre-step transpose I log-stretch and FFT the input data, I then do an out-of-core transpose of this smaller volume. I then collect the transposed data. The post-step operation is simply the inverse, transpose and then FFT and unstretch.

A second major problem is the number of iterations necessary for convergence. The causal integration and leaky integration are good preconditioners (fast convergence) but the AMO portion tends to slow the inversion. As a result many (20-100 iterations) are desirable. The global inversion approach described in Clapp (2005b) is Input/Output dominated. It also relies on hardware stability. Both of these factors make a frequency-by-frequency in-core inversion the non-ideal but better choice. The major drawback to a frequency-by-frequency approach is that the frequencies might converge at significantly different rates resulting in an image that is unrealistically dominated by certain frequency ranges (most likely the low). To minimize this problem, I stopped the inversion after a set reduction in the data residual for each frequency.

The final issue is the size of the problem. The domain of is four-dimensional and can be quite large even for a relatively small model space. In addition, for a conjugate gradient approach we still must keep three copies of our data space (data, data residual, previous step data residual) and five copies of our model space (gradient, model, previous step, previous step model residual, model residual). As a result, we need a machine with significant memory and/or must break the problem into patches in the () plane.

4/5/2006