next up previous print clean
Next: Conclusions Up: COMPLICATIONS Previous: Data size

Parallelization and Regularization

The standard Beowulf cluster, and SEP's cluster, consists of many single or dual processor nodes with communication between nodes having fairly large latency. As a result we want limited communication and as coarse a grain scheme as possible. The AMO operation Vlad and Biondi (2001) consists of:
$\bf S_{{\rm log}}$
log stretch of time axis,
$\bf F_t$
FFT of the stretched time axis,
$\bf F_{xy}$
FFT the cmpx and cmpy axes,
$\bf C$
complex multiplication
$\bf F_{xy}^{-1}=\bf F_{xy}'$
FFT the cmpx and cmpy axes,
$\bf F_t^{-1}=\bf F_t'$
FFT of the stretched time axis, and
$\bf S_{{\rm log}}^{-1}$
inverse log stretch of time axis.
Our regularization operator is not limited to the AMO operator. Our choice of regularization also effects our ability to parallelize the problem. Ideally we would like to have a regularization operator that assessed continuity over all axes (a non-stationary Prediction Error Filter for example), but that would either eliminate our ability to parallelize the problem or require massive communication between the nodes (to pass the boundary areas over the axes we parallelized over).

If we sacrifice regularizing along the time axis the problem becomes more manageable. We can redefine our data as
\begin{displaymath}
\bf d_{\rm new}=\bf F_t\bf S_{{\rm log}}\bf d,\end{displaymath} (8)
along the first axis.

We can now split the data along the time axis and regularize along any of the remaining axes. For this paper I chose to only regularize along the offset axes. A 4-D prediction error filter would be preferable, but would require simultaneously infilling and estimating the filter or some ad-hoc scheme that is beyond the scope of this paper.

Regularizing only over offset also allows additional cost savings. We can pull out the $\bf F_{xy}$ operator outside our filtering operation. For the cascaded derivative operation used in Biondi and Vlad (2001) we save the cost of six $\bf F_{xy}$ per iteration step (approximately 67% reduction in cost).

Our expanded model space (hy axis) requires a new regularization scheme. Two obvious choices come to mind. The first is to cascade derivative regularization along the hy axis. Our new regularization operator becomes
\begin{displaymath}
\bf A= \bf F_{xy}{\bf D_{hy,r}} {\bf D_{hy,l}} {\bf D_{hx,l}} {\bf D_{hx,r}} ,\end{displaymath} (9)
where $\bf D_{ha,b}$ is taking the derivative (after transforming the (t,cmpx,cmpy) cube) in the b direction along the a axis. The other approach is to use some arbitrary filter, such as a factored Laplacion Claerbout (1999). I tested both methodologies. The first approach does not have a completely symmetric impulse response Fomel (2001), but proved to converge in fewer iterations.

On even a small problem the current formulation is still problematic on SEP's current architecture. Having to store six copies of the model exceed our node's disk capacity even after splitting the data along the time axes. The final simplification is instead of solving a single global inversion problem to solve for each frequency independently. This final simplification makes the problem manageable, but at a price. We are doing a low number of conjugate gradient iterations, therefore our solution step size (and direction after the first iteration) is going to be different for the global and the individual local problems.


next up previous print clean
Next: Conclusions Up: COMPLICATIONS Previous: Data size
Stanford Exploration Project
10/14/2003