When solving an inverse problem, the effects not accounted for in the model may make the problem impossible to solve exactly. For example, if some component of is in the left null space of , no model can perfectly predict Strang (1986). In such cases, a solution that is close to the actual model is the best solution that can be obtained. For least-squares methods, the sum of the squares of the errors between the data recorded and the data that the model should have produced is taken as the measure of closeness.
In the problems considered here, it is assumed that a large number of measurements have been made and that the solution to the inversion problem is either over-determined or mixed-determinedMenke (1989). An over-determined problem is one in which all the components of the solution are over-determined, so that there will be some inconsistency, or error, in the data. A mixed-determined problem is one in which some of the components in the solution are over-determined, while other components are under-determined, so the problem has errors due to inconsistent measurements and model parameters that cannot be determined from the data. Since the problem is at least partially over-determined, there will generally be some error between the data calculated from a model and the data recorded .
In the case of a system ,the least-squares solution is the one with the smallest sum of the squares of the difference between the actual data and the data derived from the model to be calculated. This difference to be minimized, the vector of errors ,is defined as ,where is the model and is the data. The sum of the squares of the error is , where indicates the conjugate transpose, or adjoint. (For purely real , just indicates the transpose).
While will later be considered as a matrix operation, for the moment, may be considered to be any linear operator relating to .To derive a model ,the squared error is minimized. Expressed in terms of , , and , this becomes
For the system, there is an interesting connection between taking the minimum of the sum of the squares and the assumption that the errors are independent of each other. It can be shown that the two approaches are equivalent. The least-squares solution can be seen to be the solution that best fits the Gaussian distribution of the error seen above, where the samples of are independent. Maximizing is equivalent to minimizing or .This becomes the minimization of ,which is just the least-squares result for .While I will continue with the least-squares approach, the independence of the errors will be emphasized more in section .
If is a matrix and and are vectors, we get the minimum of by minimizing .Once again this minimum occurs when ,which is the expression for the least-squares inverse referred to as the normal equationsStrang (1988). To find , the inverse of must be taken to get .This leaves the somewhat simpler problem of calculating .