next up [*] print clean
Next: About this document ... Up: Table of Contents

Short Note
A simple example of a null space
and how to modify it

Dave Nichols


In many situations in geophysics we seek a solution to a problem of the form

{\bf B} {\bf m} = {\bf d},\end{displaymath}

where ${\bf B}$ is a linear operator that relates a model, ${\bf m}$,to the data, ${\bf d}$. The null space of ${\bf B}$ is the space, ${\bf N}$ for which any vector ${\bf n} \in {\bf N}$ satisfies

{\bf B} {\bf n} = {\bf 0}.\end{displaymath}

If the problem is under-determined it will have a non-empty null space. When a vector ${\bf \hat{m}}$ is a valid solution to our problem we can add an arbitrary amount of the null space to the vector and it will be a equally valid solution.

{\bf B} ( {\bf \hat{ m} } + \alpha {\bf n} ) = {\bf d} \quad \quad ;\ {\bf n} \in {\bf N}\end{displaymath}

The problem that we want to answer is ``how much of the null space should be added to the solution''.

The SVD explanation

The properties of the null space can be explored by examining the singular value decomposition of the operator. The SVD of B is given by

{\bf B} = {\bf U} {\bf \Lambda}{\bf V^\dagger}.\end{displaymath}

The matrix ${\bf \Lambda}$ is a diagonal matrix whose entries are the singular values of ${\bf B}$. The columns of the matrices ${\bf U}$ and ${\bf V}$form orthonormal bases for the data and model spaces respectively. The null space of ${\bf B}$ corresponds to the space spanned by the vectors associated with singular values of zero.

An important property of the adjoint operator ${\bf B}^\dagger$ is that, when applied to any vector, it will always produce an output that has no component in the null space. This is easily shown by considering the SVD of ${\bf B}$. The data vector ${\bf d}$ can be written as a linear combination of the basis vectors in the data space.

{\bf d} = \sum \alpha_i {\bf u}_i\end{displaymath}

When the adjoint operator is applied to this vector we obtain

{\bf g} = {\bf B}^\dagger {\bf d} = \sum \alpha_i \lambda_i {\bf v}_i,\end{displaymath}

i.e. the basis vectors in the model space are scaled by their singular value. Any basis vector associated with a singular value of zero will have no component in the resulting vector.

An example problem

Consider the trivial under-determined problem

\pmatrix{ 1 & 1 } \pmatrix{ m_1 \cr m_2 } = \pmatrix{ 1 }.\end{displaymath}

This operator has two model space vectors one of which is in the null space.

The first vector with singular value of one is,

\pmatrix{ \sqrt{2} \cr \sqrt{2} } .\end{displaymath}

The second vector with a singular value of zero, and thus in the null space is

\pmatrix{ -\sqrt{2} \cr \sqrt{2} } .\end{displaymath}

One possible solution to the problem is made purely of the first vector,

\pmatrix{ 1 & 1 } \pmatrix{ 1/2 \cr 1/2 } = \pmatrix{ 1 }\end{displaymath}

An arbitrary amount of the second vector can be added to the solution

\pmatrix{ 1 & 1 } \left[ \pmatrix{ 1/2 \cr 1/2 } + \alpha \ \pmatrix{ -1 \cr 1 } \right] = \pmatrix{ 1 }\end{displaymath}


The minimum length solution is the the solution with no null space component. These solutions are prefered by some people because they contain no information about the solution that is not a linear function of the data.

Any solution created by an iterative solver can be expressed in terms of a series expansion:

{\bf m} = {\bf m_0} + \alpha_0 {\bf B}^\dagger {\bf r_0} + \...
 ...({\bf B}^\dagger {\bf B})^n {\bf B}^\dagger {\bf r_0} + \cdots,\end{displaymath}

Where ${\bf r_0}$ is the initial residual calculated from the initial solution ${\bf m_0}$.

{\bf r_0} = {\bf d} - {\bf B}{\bf m_0}\end{displaymath}

The coefficients are calculated in a manner that depends on the algorithm.

Since the last operation in each component of this solution is always the adjoint operator, we can immediately see that this type of solver will never add any information to the null space. If the initial solution is zero, then the final solution will have no component in the null space and it will be the minimum length solution.

Iterative solution of the model problem

The model problem is solved in one iteration by almost any algorithm since the operator is of rank one. I will illustrate the properties of the solution using a steepest descent algorithm. The first step applies the adjoint operator to the residual to get a gradient vector. Assuming an initial solution of zero, the residual is the same as the data.

{\bf r_0} = {\bf d} - {\bf B}{\bf m_0} = \pmatrix{1} - \pmatrix{ 1 & 1 }\pmatrix{ 0 \cr 0 } = \pmatrix{1}.\end{displaymath}

Then the gradient is given by,

{\bf g} = {\bf B}^\dagger {\bf r} = \pmatrix{ 1 \cr 1 }\pmatrix{1} = \pmatrix{ 1 \cr 1 }.\end{displaymath}

The step length is chosen to minimize the residual

\min_\alpha \vert \alpha {\bf B} {\bf g} - {\bf d} \vert = \min_\alpha \vert \alpha \times 2 - 1 \vert.\end{displaymath}

Thus a step length of 1/2 gives the solution

{\bf m_1} = {\bf m_0} + \alpha{\bf g} = \pmatrix{ 0 \cr 0 } + \frac{1}{2} \pmatrix{ 1 \cr 1 } = \pmatrix{ 1/2 \cr 1/2 }.\end{displaymath}

The answer may be different if the initial solution is not zero. If the initial solution has energy in the null space, then that energy will remain in the final solution. An initial solution chosen arbitrarily is,

{\bf m_0} = \pmatrix{ 1 \cr 0 } = \sqrt{2} \pmatrix{ \sqrt{2} \cr \sqrt{2} } - \sqrt{2} \pmatrix{ -\sqrt{2} \cr \sqrt{2} }\end{displaymath}

This initial solution has half its energy in the null space. Since this solution has a residual of zero, it is the final solution as well.


Many methods for solving this problem introduce a premultiplying operator to the problem

{\bf M_L} {\bf B} {\bf m} = {\bf M_L} {\bf d}\end{displaymath}

This operator has different interpretations:

It can be thought of as a preconditioner dependent on the operator.
Tarantola defines a ``data covariance'' operator that is used in this way. It expresses the reliability and the correlation of the different data measurements.
In iteratively reweighted least squares (for the Lp norm), it is a data variance measure that is estimated from the current residual.
Claerbout calls it a ``change of regression''.

However, whatever the derivation, it does not change the fact that an iterative solver will put no new energy in the null space of the model. The adjoint operation is ${\bf B}^\dagger {\bf M_L}^\dagger$. The last operator applied is ${\bf B}^\dagger$ so there is no null space energy in the result.


In contrast a right preconditioner can change the energy in the null space of the solution. Now the problem to be solved is,

{\bf B} {\bf M_R} {\bf \hat{m}} = {\bf d}\end{displaymath}

followed by

{\bf m} = {\bf M_R} {\bf \hat{m}}.\end{displaymath}

Again there are many reasons given for using this type of operator

A right preconditioner dependent on the operator.
A ``quelling'' operator to stabilize the solution.
Claerbout calls it ``a change of variable in the model''. E.G. ``FROB''.

When the adjoint of this operator is applied it may produce energy in the null space of ${\bf B}$. The adjoint operation is $ {\bf
 M_R}^\dagger {\bf B}^\dagger$. When this operator is used in an iterative solver the solution will have no energy in the null space of the composite operator $({\bf B} {\bf M_R})$. The length that is minimized is the norm of the solution in the transformed space, not the norm in the original space.

\min_{{\bf \hat{m}}} \vert {\bf \hat{m}} \vert = \min_{{\bf m}}\vert {\bf M_R}^+ {\bf m} \vert\end{displaymath}

${\bf M_R}^+$ is a pseudo inverse of ${\bf M_R}$. Since ${\bf M_R}$should always have rank of the length of ${\bf m}$, a well behaved pseudo-inverse will always exist.

Right preconditioning of the example problem

The example problem can be solved with a diagonal right preconditioner. For no particular reason I have decided that I prefer to have more energy in the first unknown so I choose the preconditioner

{\bf M_R} = \pmatrix{ 10 & 0 \cr 0 & 1 }.\end{displaymath}

Starting with the same initial solution as before, zero, I find the gradient,

{\bf g} = \pmatrix{ 10 & 0 \cr 0 & 1 } \pmatrix{ 1 \cr 1 }\pmatrix{1} = \pmatrix{ 10 \cr 1 }.\end{displaymath}

The step length is chosen to minimize the residual

\min_\alpha \vert \alpha \times 101 - 1 \vert.\end{displaymath}

A step length of 1/101 gives the solution

{\bf \hat{m}_1} = \pmatrix{ 0 \cr 0 } + \frac{1}{101} \pmatrix{ 10 \cr 1 } = \pmatrix{ 10/101 \cr 1/101 }.\end{displaymath}

This solution is projected back to the original model space to give

{\bf m_1} = \pmatrix{ 10 & 0 \cr 0 & 1 } \pmatrix{ 10/101 \cr 1/101 } = \pmatrix{ 100/101 \cr 1/101 }.\end{displaymath}

The solution has lots of energy in the null space of ${\bf B}$ but no energy in the null space of ${\bf B} {\bf M_R}$


An alternative strategy for dealing with the null space is to augment the problem with a model penalty operator, ${\bf P}$.

We now seek the solution to the problem

\min_{{\bf m}} \vert {\bf P } {\bf m} \vert\end{displaymath}


\min_{{\bf m}} \vert{\bf B} {\bf m} - {\bf d} \vert\end{displaymath}

The penalty operator can be used to turn an underdetermined problem into an overdetermined problem.

\pmatrix{ {\bf B} \cr \epsilon {\bf P } } {\bf m } \approx \pmatrix{ {\bf d} \cr {\bf 0} }\end{displaymath}

Tarantola views these penalty as ``inverse model covariance operators''. I view them more generally; they describe something that I don't like about the model (e.g. roughness, smoothness, curvature, energy in one dip) that I want to minimize. Claerbout would like to estimate PEF's to use as penalty operators at the same time as he estimates the solution to the problem.

The sample problem with a model penalty operator

I can choose a simple penalty function that will minimize the energy in the second unknown

{\bf P} = \pmatrix{ 1 & 0 \cr 0 & 5 }.\end{displaymath}

Then the augmented problem, with $\epsilon = .1$, is

\pmatrix{ \pmatrix{ 1 & 1 } \cr \pmatrix{ .1 & 0 \cr 0 & .5 ...
 ... \cr m_2 } = \pmatrix{ \pmatrix{ 1 } \cr \pmatrix{ 0 \cr 0 } }.\end{displaymath}

This can be rewritten as

\pmatrix{ 1 & 1 \cr .1 & 0 \cr 0 & .5 } \pmatrix{ m_1 \cr m_2 } = \pmatrix{ 1 \cr 0 \cr 0 }.\end{displaymath}

The solution to this problem is

{\bf m} = \pmatrix{ .9524 \cr .0381 }.\end{displaymath}

By a suitable choice of penalty function we have reduced the energy in the second model parameter. However, the scaling of my penalty function was somewhat arbitrary. I would have obtained a different result if I had used scaled my operator P differently.


What is the best thing to do? I'm not sure. A good initial guess is obviously a good thing to have, but you had better be sure that you like the null-space component of your initial guess, as it isn't going to change.

The preconditioning approach has the attractive feature that you don't need to choose a scale factor. It has the limitation that the preconditioner must be full rank; you must not remove possible energy from the model.

The penalty function approach is much more general. It allows you to use any operator as a penalty function, but you have the added burden of choosing a good scale factor for that function.

next up [*] print clean
Next: About this document ... Up: Table of Contents
Stanford Exploration Project