next up previous print clean
Next: Implementation Up: Artman and Sacchi: Inversion Previous: Introduction

Development

The development of the BP principle begins with the assumption that the dictionary of indexed elementary waveform atoms, $\phi_i$,used to analyze the signal is overcomplete. This means that a signal, $\bold s$, is decomposed to the sum of all the atoms in the dictionary scaled by a scalar energy coefficient, $\alpha_i$ where the last index number is much larger than the length of the input signal with n points. Thus,
\begin{displaymath}
\bf{s}=\sum_i \alpha_i \; \phi_i \end{displaymath} (1)
has many possible configurations where the waveform atoms are linearly independent, though not necessarily orthogonal. This differs from conventional Fourier analysis where a signal of length n is decomposed into a list of n independent, orthogonal bases. There, the representation of the signal through the waveform dictionary is unique if the dictionary is simply complete. Many such dictionaries exist including wavelets, cosines, chirps, etc. An overcomplete dictionary can be devised through the combination of multiple dictionaries, or oversampling a single choice. Using the Fourier example again, a four-fold overcomplete dictionary would be defined as one where the indexed frequencies of sinusoidal waveforms sweep through the standard frequency range definition, but at a four-fold finer sampling interval. [*]

Given an overcomplete dictionary for analysis, BP simply states the goal of choosing the one representation of the signal whose coefficients, $\alpha_i$, have the smallest l1 norm. The pursuit then is searching through the dictionary for a minimum number of atoms that will act as bases and represent the signal precisely. Mathematically, this takes the form  
 \begin{displaymath}
\mbox{min}\vert\vert\alpha\vert\vert _1 \; \;\;\; \mbox{subject to} \;\; \Phi\alpha = \bf{s}\end{displaymath} (2)
where $\Phi$ is the decomposition operator associated with the indexed waveforms $\phi_i$ above. This formulation, demanding an l1 minimization, has been investigated through IRLS by many other authors in the geophysical literature Darche (1989); Nichols (1994); Taylor et al. (1979). Endemic to this approach, however, are issues related to defining convergence and the difficulties in choosing appropriate values from the parameter space. Huber norm tactics also share similar issues. BP however has adopted the infrastructure of primal-dual Linear Programming to solve the system.

Linear programming (LP) has a large literature and several techniques by which to solve problems of the form  
 \begin{displaymath}
\mbox{min}_{\bf x} \bold c^T \bf{x} \; \;\;\; \mbox{subject to }\; \; \bf{Ax} = \bf{b}, \; \bf{x}
\geq 0 \; .\end{displaymath} (3)
This optimization problem minimizes an objective cost function, $\bold c^T \bf{x}$,subject to two conditions: some data and an operator, and a positivity requirement. The underlying proposition of Chen et al. (1999) is the equivalence of the equations (2) and (3). If the analysis operator $\Phi$ is used as A, the cost function $\bold c$ is set to ones, and use the signal $\bf{s}$ as the forcing vector $\bf{b}$, the BP principle falls neatly into the LP format.

Having recognized the format, one needs only choose among the available methods to solve an LP problem. BP chooses an interior point (IP) method as opposed to the simplex method of Claerbout and Muir (1973) to solve the LP. IP introduces a two loop structure to the solver: a barrier loop that calculates fabrication and direction variables, and an interior solver loop that uses a conjugate gradient minimization of the system derived from the current status of the solution and the derived directions. Heuristically, we begin with non-zero coefficients across the dictionary, and iteratively modify them, while maintaining the feasibility constraints, until energy begins to coalesce into the significant atoms of the sparse model space. Upon reaching this concentration, the method pushes small energy values to zero and completes the basis pursuit decomposition. For this reason, models where the answer has a sparse representation converge much more rapidly. If the model space is not sparse, or cannot be sparse, this is the wrong tool.

A particular benefit of adopting this structure to solve the decomposition problem lies in the primal-dual nature of interior point (as opposed to simplex) LP methods. For any minimization problem, the primal, there is a equally valid maximization problem, its dual, that can be simultaneously evaluated during the iterative solving process. It is known that at convergence, the primal and dual variables are the same. Thus, the separation between the primal and dual model space can be evaluated to provide a rigorous test for convergence. The two loop structure then takes the form of setting up a least squares problem that solves for the update to the dual variable, then evaluating the new solutions from this answer and testing the duality gap. When the difference is less than the tolerance proscribed by the user, the algorithm terminates.

The interior loop that uses a CG solver solves a system that looks like  
 \begin{displaymath}
\mbox{min}\left\Vert\left( 
 \begin{array}
{c}
 \bold D^{1/2...
 ...bf{v}) \\  \bf{r}/\delta \\  \end{array}\right)
 \right\Vert _2\end{displaymath} (4)
where matrix $\bold D$, and vectors $\bf{t,v,r}$ are calculated quantities from the IP method, $\bold X=diag(x)$, $\Delta \bf{y}$ is the barrier step update for the dual variable, and $\delta$ is an input parameter that controls the regularization of the search direction for the IPLP solver. $\bold A^T$ is the original operator from equation (3), so we can see that during the inner loop we will be evaluating the operator and its adjoint many times. If there is not either a sparse matrix representation or a fast implicit algorithm for this operator, this step makes using the method prohibitive. Importantly, because the interior loop solves for model updates, it is not very important to produce precise results, and thus any relatively accurate solution will help the iterative improvement of the next barrier step.

During this brief explanation of the concept, only two user defined parameters have been mentioned. The first is the desired tolerance associated with the duality gap, and the second is $\delta$. In a realistic implementation there are, of course, a few more parameters. They are largely functions of these two mentioned, automatically updated in the IPLP algorithm, or constants.

$\delta$ has interesting properties to discuss. Saunders and Tomlin (1996) describes in detail the mathematics associated with this approach. The regularization of the search direction introduced with $\delta$ actually makes this one of a class of LP problems called perturbed, and introduces a minor change to equation (3). Leaving those details to the reader, I will address the choice of the parameter. Thankfully, the BP will converge if $\delta$ is anywhere between 1 and 4*10-4. If the regularization parameter is small, the BP converges slowly, but to a more precise solution that honors the data exactly. As $\delta \rightarrow 1$, the central equations solved in the inner loop become highly damped resulting in three affects: 1) speedy convergence, 2) effective denoising, and 3) less rigor attached to the ``subject to'' condition in equation (3). The penalty is that the result lacks some sparsity and precision compared to the result with a small value.


next up previous print clean
Next: Implementation Up: Artman and Sacchi: Inversion Previous: Introduction
Stanford Exploration Project
10/14/2003