When solving an inverse problem,
the effects not accounted for
in the model may make the problem impossible to solve exactly.
For example, if some component of
is in the left null space of
, no model
can perfectly predict
Strang (1986).
In such cases, a solution that is close to the actual
model is the best solution that can be obtained.
For least-squares methods,
the sum of the squares of the errors between the data recorded and
the data that the model should have produced
is taken as the measure of closeness.
In the problems considered here,
it is assumed that a large number of measurements have been made
and that the solution to the inversion problem is either
over-determined or mixed-determinedMenke (1989).
An over-determined problem is one in which all the components of the solution
are over-determined, so that there will be some inconsistency,
or error, in the data.
A mixed-determined problem
is one in which some of the components in the solution
are over-determined,
while other components are under-determined,
so the problem has errors due to inconsistent measurements
and model parameters that cannot be determined from the data.
Since the problem is at least partially over-determined,
there will generally be some error between the data calculated from a
model
and the data recorded
.
In the case of a system
,the least-squares solution is the one with the smallest sum of the squares
of the difference between the actual data and the data derived
from the model to be calculated.
This difference to be minimized, the vector of errors
,is defined as
,where
is the model and
is the data.
The sum of the squares of the error is
, where
indicates the
conjugate transpose, or adjoint.
(For purely real
,
just indicates the transpose).
While
will later be considered as a matrix operation,
for the moment,
may be considered to be any linear operator
relating
to
.To derive a model
,the squared error
is minimized.
Expressed in terms of
,
, and
,
this becomes
| (19) |
| (20) |
| (21) |
For the
system,
there is an interesting connection between taking the
minimum of the sum of the squares
and the assumption that the errors
are independent of each other.
It can be shown that the two approaches are equivalent.
The least-squares solution can be seen to be the solution
that best fits the Gaussian distribution of the error
seen above,
where the samples of
are independent.
Maximizing
is equivalent to
minimizing
or
.This becomes the minimization of
,which is just the least-squares result for
.While I will continue with the least-squares approach,
the independence of the errors will be emphasized more in
section
.
If
is a matrix and
and
are vectors,
we get the minimum of
by minimizing
.Once again this minimum occurs when
,which is the
expression for the least-squares inverse
referred to as the normal equationsStrang (1988).
To find
, the inverse of
must be taken to
get
.This leaves the somewhat simpler problem of calculating
.