Least-squares solutions to inverse problems

Next: Solving methods Up: Background and definitions Previous: Assumptions about the data

Least-squares solutions to inverse problems

When solving an inverse problem, the effects not accounted for in the model may make the problem impossible to solve exactly. For example, if some component of $\sv d$ is in the left null space of $\st A$ , no model $\sv m$ can perfectly predict $\sv d$ Strang (1986). In such cases, a solution that is close to the actual model is the best solution that can be obtained. For least-squares methods, the sum of the squares of the errors between the data recorded and the data that the model should have produced is taken as the measure of closeness.

In the problems considered here, it is assumed that a large number of measurements have been made and that the solution to the inversion problem is either over-determined or mixed-determinedMenke (1989). An over-determined problem is one in which all the components of the solution are over-determined, so that there will be some inconsistency, or error, in the data. A mixed-determined problem is one in which some of the components in the solution are over-determined, while other components are under-determined, so the problem has errors due to inconsistent measurements and model parameters that cannot be determined from the data. Since the problem is at least partially over-determined, there will generally be some error between the data calculated from a model $\sv d_{\rm{calc}} = \st A \sv m$ and the data recorded $\sv d$ .

In the case of a system $\st A\sv m=\sv d$ ,the least-squares solution is the one with the smallest sum of the squares of the difference between the actual data and the data derived from the model to be calculated. This difference to be minimized, the vector of errors $\sv e$ ,is defined as $\sv e = \st A \sv m - \sv d$ ,where $\sv m$ is the model and $\sv d$ is the data. The sum of the squares of the error is $\sv e^{\dagger} \sv e$ , where $\dagger$ indicates the conjugate transpose, or adjoint. (For purely real $\sv e$ , $\dagger$ just indicates the transpose).

While $\st A$ will later be considered as a matrix operation, for the moment, $\st A$ may be considered to be any linear operator relating $\sv m$ to $\sv d$ .To derive a model $\sv m$ ,the squared error $\sv e^{\dagger} \sv e$ is minimized. Expressed in terms of $\st A$ , $\sv d$ , and $\sv m$ , this becomes

$\begin{displaymath} \sv e^{\dagger} \sv e = (\st A\sv m-\sv d)^{\dagger} (\st A\sv m-\sv d).\end{displaymath}$ (19)

Expanding this produces

$\begin{displaymath} \sv e^{\dagger} \sv e = \sv m^{\dagger} \st A^{\dagger} \st ... ... d^{\dagger} \st A\sv m -\sv m^{\dagger} \st A^{\dagger} \sv d.\end{displaymath}$ (20)

To minimize $\sv e^{\dagger} \sv e$ with respect to $\sv m$ , we can find where the derivative is zero for either $\sv m^{\dagger}$ or $\sv m$ , but $\sv m^{\dagger}$ is more convenient. The derivative of the previous expression then becomes

$\begin{displaymath} {\partial\over \partial \sv m^{\dagger} } (\sv e^{\dagger} \... ... ) = \st A^{\dagger} \st A \sv m - \st A^{\dagger} \sv d = 0,\end{displaymath}$ (21)

producing $\st A^{\dagger} \st A \sv m = \st A^{\dagger} \sv d$ .The value of $\sv m$ that minimizes $\sv e^{\dagger} \sv e$ is then $\sv m = (\st A^{\dagger} \st A)^{-1} \st A^{\dagger} \sv d$ .

For the $\sv r = \sv f \ast \sv d$ system, there is an interesting connection between taking the minimum of the sum of the squares and the assumption that the errors are independent of each other. It can be shown that the two approaches are equivalent. The least-squares solution can be seen to be the solution that best fits the Gaussian distribution of the error $p(\sv r) \propto e^{-\sv r^{\dagger} \sigma_r^{-2} \st I \sv r }$ seen above, where the samples of $\sv r$ are independent. Maximizing $p(\sv r)$ is equivalent to minimizing $-\log(p(\sv r))$ or $\sv r^{\dagger} \sigma_r^{-2} \st I \sv r$ .This becomes the minimization of $\sv r^{\dagger} \sv r$ ,which is just the least-squares result for $\sv r=\st F\sv d$ .While I will continue with the least-squares approach, the independence of the errors will be emphasized more in section .

If $\st A$ is a matrix and $\sv d$ and $\sv m$ are vectors, we get the minimum of $\sv e^{\dagger} \sv e$ by minimizing $(\st A \sv m - \sv d)^{\dagger} (\st A \sv m - \sv d)$ .Once again this minimum occurs when $(\st A^{\dagger} \st A) \sv m = \st A^{\dagger} \sv d$ ,which is the expression for the least-squares inverse referred to as the normal equationsStrang (1988). To find $\sv m$ , the inverse of $(\st A^{\dagger} \st A)$ must be taken to get $\sv m = (\st A^{\dagger} \st A)^{-1} \st A^{\dagger} \sv d$ .This leaves the somewhat simpler problem of calculating $(\st A^{\dagger} \st A)^{-1}$ .

Next: Solving methods Up: Background and definitions Previous: Assumptions about the data

Stanford Exploration Project
2/9/2001