Next: regularized least-squares inversion
Up: Tang: Regularized inversion
Previous: introduction
If we let be the model vector and be the data vector, Bayes's theorem would state
| |
(1) |
where is the distribution of the model parameters posterior to the data ,expressing the likelihood of the model for a given data ; is the probability distribution of the model , representing the
prior information about the model; and describes how knowledge of the data modifies
the prior knowledge. The quantity is the probability distribution of the data ,
which is a constant for a given data ; thus can be seen as a scaling factor Ulrych et al. (2001).
Therefore, equation (1) can be simplified as
| |
(2) |
In the presence of noise, the recorded data can be expressed as follows:
| |
(3) |
where is the noise vector. The likelihood function is constructed by
taking the difference between the observed data and the modeled data, thus
| |
(4) |
If we assume the noise has a Gaussian distribution with a zero mean and variance , its
probability function can be written as follows:
| |
(5) |
If each component of the noise vector is independent and
the variance of each noise sample remains constant, the total
probability of the noise vector is
| |
(6) |
| (7) |
| (8) |
where is the adjoint of . If we further assume that the model parameters have a Gaussian distribution with a zero mean and a variance and are independent, the probability
of then is
| |
(9) |
| (10) |
Plugging equations (8) and (10) into equation (2) yields the following posterior distribution:
| |
(11) |
| (12) |
Since the posterior probability is a quantity describing the probability that the model is correct
given a certain set of observations, we would like it to be maximized. Maximizing the posterior function is
equivalent to finding the minimum of the following objective function:
| |
(13) |
where . We can see that if the model parameters are assumed to
have a Gaussian distribution, the solution of maximum posterior probability under Bayes's theorem is equivalent to
the damped least-squares solution. Thus, Bayesian inversion gives another perspective on the same problem,
where minimizing the model residuals in the L2 norm corresponds to a Gaussian prior probability distribution.
As the Gaussian distribution is a short-tailed function that is tightly centered around the mean, it will
result in a smooth solution. Therefore, when a sparse solution is required, the L2
norm is no longer appropriate, and long-tailed distribution functions such as the exponential and Cauchy distributions
should be chosen for the prior probability distribution of .
If we keep equation (8) unchanged and assume satisfies the exponential probability
distribution, with a mean of zero:
| |
(14) |
| (15) |
Then the a posteriori probability becomes
| |
(16) |
Finding the maximum of the above function is equivalent to finding the minimum of the following objective function:
| |
(17) |
where , and the regularization term, , is defined by
| |
(18) |
Therefore, by adding an L1 norm regularization term to the least-squares problem, we get a sparse solution
of the model parameters.
Though minimizing the objective function (17) results in a non-linear problem, it can be solved efficiently
by Iteratively Re-weighted Least-Squares (IRLS). The gradient of the regularization term is
| |
(19) |
Therefore the regularization term , which minimizes the residual in the L1 norm, can be solved
in the L2 norm by introducing a diagonal weighting function :
| |
(20) |
Then the objective function (17) can be changed to
| |
(21) |
| (22) |
The derivation of the objective function for the a priori case with a Cauchy distribution is similar, except
the diagonal weighting function changes to the following:
| |
(23) |
Since is a function of the model , if we use the gradient-based IRLS method to solve the objective
function (22), we have to recompute the weight at each iteration and the algorithm can be summarized as
follows:
- 1.
- At the first iteration, is set to be the identity matrix:
| |
(24) |
- 2.
- At the kth iteration, we solve the following fitting goals:
| |
(25) |
| (26) |
where Wmik-1 = (|mik-1|)-1/2 in the case of the L1 norm,
or in the case of the Cauchy norm.
Next: regularized least-squares inversion
Up: Tang: Regularized inversion
Previous: introduction
Stanford Exploration Project
1/16/2007