Next: FEWER EQUATIONS THAN UNKNOWNS Up: Data modeling by least Previous: MORE EQUATIONS THAN UNKNOWNS

WEIGHTS AND CONSTRAINTS

It often happens that some observations are considered more reliable than others. One may desire to weight the more reliable data more heavily in the calculation. In other words, we may multiply the ith equation by a weight $\sqrt{w_i}$

$\begin{displaymath} \sqrt{w_i} \; [-c_i\quad a_{i1} \quad a_{i2} \quad \cdots ]... ... \\ x_2 \\ \vdots \end{array} \right] \eq \sqrt{w_i} e_i\end{displaymath}$ (28)

Now the weighted sum-squared error will be

$\begin{displaymath} E \eq \sum_i w_i {e_i}^2\end{displaymath}$ (29)

Following the method of the last section, it is easy to show that the x which minimizes the weighted error E of (29) is the x which satisfies the simultaneous equations

$\begin{displaymath} \left\{ \sum_{i} w_i \; \left[ \begin{array} {c} -c_i \\ ... ...{array} {c} v \\ 0 \\ 0 \\ \vdots \end{array} \right]\end{displaymath}$ (30)

Choice of a set of weights is often a rather subjective matter. However, if data are of uneven quality, it cannot be avoided. Omitting w is equivalent to choosing it equal to unity.

A case of common interest is where some equations should be solved exactly. Such equations are called constraint equations. Constraint equations often arise out of theoretical considerations so they may, in principle, not have any error. The rest of the equations often involve some measurement. Since the measurement can often be made many times, it is easy to get a lot more equations than unknowns. Since measurement always involves error, we then use the method of least squares to minimize the average error. In order to be certain that the constraint equations are solved exactly, one could use the trick of applying very large weight factors to the constraint equations. A problem is that ``very large'' is not well defined. A weight equal 10¹⁰ might not be large enough to guarantee the constraint equation is satisfied with sufficient accuracy. On the other hand, 10¹⁰ might lead to disastrous round-off when solving the simultaneous equations in a computer with eight-digit accuracy. The best approach is to analyze the situation theoretically for $w \rightarrow \infty$ .

An example of a constraint equation is that the sum of the x_i equals M. Another constraint would be x₁ = x₂. Arranged in a matrix, these two constraint equations are

$\begin{displaymath} \left[ \begin{array} {rcrccc} -M & 1 & 1 & 1 & 1 & \cdots \... ...t] \eq \left[ \begin{array} {c} 0 \\ 0 \end{array} \right]\end{displaymath}$ (31)

We write a general set of k constraint equations as

$\begin{displaymath} {\bf G} \left[ \begin{array} {c} 1 \\ x_1 \\ x_2 \\ ... ...eq \left[ \begin{array} {c} 0 \\ \vdots \end{array} \right]\end{displaymath}$ (32)

Minimizing the error as $w \rightarrow \infty$ of the equations

$\begin{eqnarraystar} \sqrt{w} {\bf Gx} &\approx & {\bf 0} \nonumber \\ {\bf Bx} &\approx & {\bf 0} \end{eqnarraystar}$

is algebraically similar to minimizing the error of Bx $\approx$ 0. The rows of $\sqrt{w} {\bf G}$ are just like some extra rows for B. The resulting equation for x is

$\begin{displaymath} \left\{ \sum^n_{i = 1} \left[ \begin{array} {c} -c_i \\ ... ... \begin{array} {c} v \\ 0 \\ \vdots \end{array} \right]\end{displaymath}$ (33)

Now we will take all the w_i to equal $1/\varepsilon$ and we will let $\varepsilon$ tend to zero. Also let

$\begin{eqnarray} {\bf x} &= & {\bf x}^{(0)} + \varepsilon {\bf x}^{(1)} + \var... ... \varepsilon {\bf v}^{(1)} + \varepsilon^2 {\bf v}^{(2)} + \cdots\end{eqnarray}$ (34)

(35)

With this, (33) may be written

$\begin{displaymath} \left( {\bf B}^T {\bf B} + {1 \over {\varepsilon}} {\bf G}^T... ...\cdots) \eq {\bf v}^{(0)} + {\bf v}^{(1)} \varepsilon + \cdots\end{displaymath}$ (36)

Identify coefficients of powers of $\varepsilon$

$\begin{eqnarray} \varepsilon^{-1} & : & {\bf G}^T {\bf Gx}^{(0)} = {\bf 0} \\ \... ... \varepsilon^1, \varepsilon^2 & : & \hbox{not required} \nonumber\end{eqnarray}$ (37)

(38)

Equation (37) is m equations in m unknowns. It will automatically be satisfied if the k equations in (32) are satisfied. Equation (38) appears to involve the m unknowns in ${\bf x}^{(0)}$ plus m more unknowns in ${\bf x}^{(1)}$ . In fact, we do not need ${\bf x}^{(1)}$ ;the k unknowns

$\begin{displaymath} \lambda \eq {\bf Gx}^{(1)}\end{displaymath}$ (39)

will suffice.

Arranging (38) and (32) together and dropping superscripts, we get a square matrix in m + k unknowns.

$\begin{displaymath} \left[ \begin{array} {c} {\bf B}^T {\bf B} \\ \vbox{\h... ...t[ \begin{array} {c} v \\ 0 \\ \vdots \end{array} \right]\end{displaymath}$ (40)

Equation (40) is now a simultaneous set for the unknowns ${\bf x}$ and $\lambda$ . It might also be thought of as the solution to the problem of minimizing the quadratic form

$\begin{eqnarraystar} E &= & [{\bf x}^T \quad \lambda^T]\left[ \begin{array} {cc}... ... \lambda^T {\bf Gx} + {\bf x}^T {\bf G}^T \lambda \nonumber \end{eqnarraystar}$

and since we can always transpose a scalar,

$\begin{displaymath} E \eq {\bf x}^T {\bf B}^T {\bf Bx} + 2\lambda^T {\bf Gx}\end{displaymath}$ (41)

According to the method of Lagrange multipliers, one may minimize a quadratic form subject to constraints by minimizing instead a sum of the quadratic form plus constraint terms where each constraint term is the product of a constraint equation multiplied by a Lagrange multiplier $\lambda_i$ . This is precisely what we have in (41), and the solution is given by (40). Lagrange multipliers frequently arise in connection with integral equations. The concept is readily transformed to matrices merely by approximating integration by summation.

EXERCISES:

In determining a density vs. depth profile of the earth one might minimize the squared difference between some theoretical quantities (say, the frequencies of free oscillation) and the observed quantities. By astronomical means, total mass and moment of inertia of the earth are very well known. If the earth is divided into arbitrarily thin shells of equal thickness, what are the two astronomical constraint equations on the layer densities p_i? If the least-squares problem is nonlinear (as it often is) it may be linearized by assuming that a given set of densities p_i is a good guess which satisfies the constraints and doing least squares for the perturbation dp_i. What are the constraint equations on dp_i?

Next: FEWER EQUATIONS THAN UNKNOWNS Up: Data modeling by least Previous: MORE EQUATIONS THAN UNKNOWNS

Stanford Exploration Project
10/30/1997

	$\begin{eqnarraystar} \sqrt{w} {\bf Gx} &\approx & {\bf 0} \nonumber \\ {\bf Bx} &\approx & {\bf 0} \end{eqnarraystar}$

	$\begin{eqnarray} {\bf x} &= & {\bf x}^{(0)} + \varepsilon {\bf x}^{(1)} + \var... ... \varepsilon {\bf v}^{(1)} + \varepsilon^2 {\bf v}^{(2)} + \cdots\end{eqnarray}$	(34)
		(35)

	$\begin{eqnarray} \varepsilon^{-1} & : & {\bf G}^T {\bf Gx}^{(0)} = {\bf 0} \\ \... ... \varepsilon^1, \varepsilon^2 & : & \hbox{not required} \nonumber\end{eqnarray}$	(37)
		(38)

	$\begin{eqnarraystar} E &= & [{\bf x}^T \quad \lambda^T]\left[ \begin{array} {cc}... ... \lambda^T {\bf Gx} + {\bf x}^T {\bf G}^T \lambda \nonumber \end{eqnarraystar}$