Next: regularized least-squares inversion Up: Tang: Regularized inversion Previous: introduction

# bayes inversion

If we let be the model vector and be the data vector, Bayes's theorem would state
 (1)
where is the distribution of the model parameters posterior to the data ,expressing the likelihood of the model for a given data ; is the probability distribution of the model , representing the prior information about the model; and describes how knowledge of the data modifies the prior knowledge. The quantity is the probability distribution of the data , which is a constant for a given data ; thus can be seen as a scaling factor Ulrych et al. (2001). Therefore, equation (1) can be simplified as
 (2)

In the presence of noise, the recorded data can be expressed as follows:
 (3)
where is the noise vector. The likelihood function is constructed by taking the difference between the observed data and the modeled data, thus
 (4)
If we assume the noise has a Gaussian distribution with a zero mean and variance , its probability function can be written as follows:
 (5)
If each component of the noise vector is independent and the variance of each noise sample remains constant, the total probability of the noise vector is
 (6) (7) (8)
where is the adjoint of . If we further assume that the model parameters have a Gaussian distribution with a zero mean and a variance and are independent, the probability of then is
 (9) (10)
Plugging equations (8) and (10) into equation (2) yields the following posterior distribution:
 (11) (12)
Since the posterior probability is a quantity describing the probability that the model is correct given a certain set of observations, we would like it to be maximized. Maximizing the posterior function is equivalent to finding the minimum of the following objective function:
 (13)
where . We can see that if the model parameters are assumed to have a Gaussian distribution, the solution of maximum posterior probability under Bayes's theorem is equivalent to the damped least-squares solution. Thus, Bayesian inversion gives another perspective on the same problem, where minimizing the model residuals in the L2 norm corresponds to a Gaussian prior probability distribution. As the Gaussian distribution is a short-tailed function that is tightly centered around the mean, it will result in a smooth solution. Therefore, when a sparse solution is required, the L2 norm is no longer appropriate, and long-tailed distribution functions such as the exponential and Cauchy distributions should be chosen for the prior probability distribution of .

If we keep equation (8) unchanged and assume satisfies the exponential probability distribution, with a mean of zero:
 (14) (15)
Then the a posteriori probability becomes
 (16)

Finding the maximum of the above function is equivalent to finding the minimum of the following objective function:
 (17)
where , and the regularization term, , is defined by
 (18)
Therefore, by adding an L1 norm regularization term to the least-squares problem, we get a sparse solution of the model parameters.

Though minimizing the objective function (17) results in a non-linear problem, it can be solved efficiently by Iteratively Re-weighted Least-Squares (IRLS). The gradient of the regularization term is
 (19)
Therefore the regularization term , which minimizes the residual in the L1 norm, can be solved in the L2 norm by introducing a diagonal weighting function :
 (20)
Then the objective function (17) can be changed to
 (21) (22)

The derivation of the objective function for the a priori case with a Cauchy distribution is similar, except the diagonal weighting function changes to the following:
 (23)

Since is a function of the model , if we use the gradient-based IRLS method to solve the objective function (22), we have to recompute the weight at each iteration and the algorithm can be summarized as follows:

1.
At the first iteration, is set to be the identity matrix:
 (24)
2.
At the kth iteration, we solve the following fitting goals:
 (25) (26)
where Wmik-1 = (|mik-1|)-1/2 in the case of the L1 norm, or in the case of the Cauchy norm.

Next: regularized least-squares inversion Up: Tang: Regularized inversion Previous: introduction
Stanford Exploration Project
1/16/2007