Next: WEIGHTS AND CONSTRAINTS Up: Data modeling by least Previous: Data modeling by least

MORE EQUATIONS THAN UNKNOWNS

When there are more linear equations than unknowns, it is usually impossible to find a solution which satisfies all the equations. Then one often looks for a solution which approximately satisfies all the equations. Let a and c be known and x be unknown in the following set of equations where there are more equations than unknowns.

$\begin{displaymath} \left[ \begin{array} {cccc} a_{11} & a_{12} & \cdots & a_{... ...{c} c_1 \\ c_2 \\ \vdots \\ c_n \\ \end{array} \right]\end{displaymath}$ (1)

Usually there will be no set of x_i which exactly satisfies (1). Let us define an error vector e_j by

$\begin{displaymath} \left[ \begin{array} {cccc} a_{11} & a_{12} & \cdots & a_{... ... {c} e_1 \\ e_2 \\ \vdots \\ e_n \\ \end{array} \right]\end{displaymath}$ (2)

It simplifies the development to rewrite this equation as follows (a trick I learned from John P. Berg).

$\begin{displaymath} \left[ \begin{array} {cccc} -c_1 & a_{11} & \cdots & a_{1m... ... {c} e_1 \\ e_2 \\ \vdots \\ e_n \\ \end{array} \right]\end{displaymath}$ (3)

We may abbreviate this equation as

$\begin{displaymath} {\bf Bx} \eq {\bf e}\end{displaymath}$ (4)

where B is the matrix containing c and a. The ith error may be written as a dot product and either vector may be written as the column

$\begin{eqnarraystar} e_i \eq [b_{i1} \quad b_{i2}\quad \cdots] \; \left[ \beg... ...1} \\ b_{i2} \\ \vdots \end{array} \right] \nonumber \\ \end{eqnarraystar}$

Now we will minimize the sum squared error E defined as $\sum {e_i}^2$

$\begin{displaymath} E \eq \sum_i [1 \quad x_1 \quad \cdots] \; \left[ \begin{a... ...\begin{array} {c} 1 \\ x_1 \\ \vdots \end{array} \right]\end{displaymath}$ (5)

The summation may be brought inside the constants

$\begin{displaymath} E \eq [1 \quad x_1 \quad x_2 \quad \cdots] \; \left\{ \sum... ...ay} {c} 1 \\ x_1 \\ x_2 \\ \vdots \end{array} \right]\end{displaymath}$ (6)

The matrix in the center, call it r_ij, is symmetrical. It is a positive (more strictly, nonnegative) definite matrix because you will never be able to find an x for which E is negative, since E is a sum of squared e_i. We find the x with minimum E by requiring $\partial E/\partial x_1 = 0, \ \partial E/\partial x_2 = 0, \, \ldots , \partial E/\partial x_m = 0.$ Notice that this will give us exactly one equation for each unknown. In order to clarify the presentation we will specialize (6) to two unknowns.

$\begin{displaymath} E \eq [1 \quad x_1 \quad x_2] \; \left[ \begin{array} {ccc... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (7)

Setting to zero the derivative with respect to x₁, we get

$\begin{displaymath} 0 \eq {\partial E \over \partial x_1} \eq [0 \quad 1 \quad ... ...left[ \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right]\end{displaymath}$ (8)

Since r_ij = r_ji, both terms on the right are equal. Thus (8) may be written

$\begin{displaymath} 0 \eq {\partial E \over \partial x_1} \eq 2 [r_{10} \quad r... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (9)

Likewise, differentiating with respect to x₂ gives

$\begin{displaymath} 0 \eq {\partial E \over \partial x_2} \eq 2 [r_{20} \quad r... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (10)

Equations (9) and (10) may be combined

$\begin{displaymath} \left[ \begin{array} {c} 0 \\ 0 \end{array} \right] \e... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (11)

This form is two equations in two unknowns. One might write it in the more conventional form

$\begin{displaymath} \left[ \begin{array} {cc} r_{11} & r_{12} \\ r_{21} & r_{... ...ft[ \begin{array} {c} r_{10} \\ r_{20} \end{array} \right]\end{displaymath}$ (12)

The matrix of (11) lacks only a top row to be equal to the matrix of (7). To give it that row, we may augment (11) by

$\begin{displaymath} v \eq r_{00} + r_{01} x_1 + r_{02} x_2\end{displaymath}$ (13)

where (13) may be regarded as a definition of a new variable v. Putting (13) on top of (11) we get

$\begin{displaymath} \left[ \begin{array} {c} v \\ 0 \\ 0 \end{array} \ri... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (14)

The solution x of (12) or (14) is that set of x_k for which E is a minimum. To get an interpretation of v, we may multiply both sides by $[1 \quad x_1 \quad x_2]$ , getting

$\begin{displaymath} v \eq [1 \quad x_1 \quad x_2] \; \left[ \begin{array} {c} ... ...[ \begin{array} {c} 1 \\ x_1 \\ x_2 \end{array} \right]\end{displaymath}$ (15)

Comparing (15) with (7), we see that v is the minimum value of E.

Occasionally, it is more convenient to have the essential equations in partitioned matrix form. In partitioned matrix form, we have for the error (6)

$\begin{displaymath} E \eq [1 \ \vdots \ {\bf x}]^T \left[ \begin{array} {c} -... ...1 \\ \hbox to 0.25in{\dotfill} \\ {\bf x} \end{array} \right]\end{displaymath}$ (16)

The final equation (14) splits into

$\begin{eqnarray} V &= & {\bf c}^T{\bf c} - {\bf c}^T {\bf Ax} \\ {\bf 0} &= & -{\bf A}^T {\bf c} + {\bf A}^T {\bf Ax}\end{eqnarray}$ (17)

(18)

where (18) represents simultaneous equations to be solved for x. Equation (18) is what you have to set up in a computer. It is easily remembered by a quick and dirty (very dirty) derivation. That is, we began with the overdetermined equations ${\bf Ax} \approx {\bf c}$ ;premultiplying by ${\bf A}^T$ gives $({\bf A}^T {\bf A}) {\bf x} = {\bf A}^T {\bf c}$ which is (18).

In physical science applications, the variable z_j is frequently a complex variable, say z_j = x_j + iy_j. It is always possible to go through the foregoing analyses, treating the problem as though x_i and y_i were real independent variables. There is a considerable gain in simplicity and a saving in computational effort by treating z_j as a single complex variable. The error E may be regarded as a function of either x_j and y_j or z_j and $\overline{z}_j$ . In general $j = 1,\, 2, \, \ldots , \, N,$ but we will treat the case N = 1 here and leave the general case for the Exercises. The minimum is found where

$\begin{eqnarray} 0 &= & {dE \over dx} \eq {\partial E \over \partial z} {dz \ov... ...er \partial z} - {\partial E \over \partial \overline{z}} \right)\end{eqnarray}$ (19)

(20)

Multiplying (20) by i and adding and subtracting these equations, we may express the minimum condition more simply as

$\begin{eqnarray} 0 &= & {\partial E \over \partial z} \\ 0 &= & {\partial E \over \partial \overline{z}}\end{eqnarray}$ (21)

(22)

However, the usual case is that E is a positive real quadratic function of z and $\overline{z}$ and that $\partial E/\partial z$ is merely the complex conjugate of $\partial E/\partial \overline{z}$ . Then the two conditions (21) and (22) may be replaced by either one of them. Usually, when working with complex variables we are minimizing a positive quadratic form like

$\begin{displaymath} E(z^*, z) \eq \vert{\bf Az} - {\bf c}\vert^2 \eq ({\bf z}^* {\bf A}^* - {\bf c}^*) ({\bf Az} - {\bf c})\end{displaymath}$ (23)

where ^* denotes complex-conjugate transpose. Now (22) gives

$\begin{displaymath} 0 \eq {\partial E \over \partial z^*} \eq {\bf A}^* ({\bf Az} - {\bf c})\end{displaymath}$ (24)

which is just the complex form of (18).

Let us consider an example. Suppose a set of wave arrival times t_i is measured at sensors located on the x axis at points x_i. Suppose the wavefront is to be fitted to a parabola $t_i \approx a + bx_i + {cx_i}^2$ .Here, the x_i are knowns and a, b, and c are unknowns. For each sensor i we have an equation

$\begin{displaymath}[-t_i\quad 1 \quad x_i \quad {x_i}^2] \; \left[ \begin{array}... ...\ a \\ b \\ c \end{array} \right] \quad\approx\quad 0\end{displaymath}$ (25)

When i has greater range than 3 we have more equations than unknowns. In this example, (14) takes the form

$\begin{displaymath} \left\{ \sum^n_{i = 1} \left[ \begin{array} {c} -t_i \\ ... ...begin{array} {c} v \\ 0 \\ 0 \\ 0 \end{array} \right]\end{displaymath}$ (26)

This may be solved by standard methods for a, b, and c.

The last three rows of (26) may be written

$\begin{displaymath} \sum^n_{i = 1} \left[ \begin{array} {c} 1 \\ x_i \\ ... ...left[ \begin{array} {c} 0 \\ 0 \\ 0 \end{array} \right]\end{displaymath}$ (27)

This says the error vector e_i is perpendicular (or normal) to the functions 1, x, and x², which we are fitting to the data. For that reason these equations are often called normal equations.

EXERCISES:

Extend (24) by fitting waves observed in the x, y plane to a two-dimensional quadratic.
Let y(t) constitute a complex-valued function at successive integer values of t. Fit y(t) to a least-squares straight line $y(t) \approx \alpha + \beta t$ where $\alpha = \alpha_r + i\alpha_t$ and $\beta = \beta_r + i\beta_t$ . Do it two ways: (a) Assume $\alpha_r$ , $\alpha_t$ , $\beta_i$ , and $\beta_r$ are four independent variables, and (b) Assume $\alpha$ , $\bar {\alpha}$ , $\beta$ , and $\bar {\beta}$ are independent variables. (Leave answer in terms of $s_n = \sum_t t^n$ .)
Equation (14) has assumed all quantities are real. Generalize equation (14) to all complex quantities. Verify that the matrix is Hermitian.
At the jth seismic observatory (latitude x_j, longitude y_j) earthquake waves are observed to arrive at time t_j. It has been conjectured that the earthquake has an origin time t, latitude x, and longitude y. The theoretical travel time may be looked up in a travel time table $T(\Delta)$ where T is the travel time and $\Delta$ is the great circle angle. One has
$\begin{displaymath} \cos \Delta \eq \sin y \sin y_i + \cos y \cos y_i \cos (x - x_i) \end{displaymath}$
The time residual at the jth station, supposing that the earthquake occurred at (x, y, t), is
$\begin{displaymath} e_j \eq t + T(\Delta_j) - t_j\end{displaymath}$
The time residual, supposing that the earthquake occurred at (x + dx, y + dy, t + dt), is
$\begin{displaymath} e_j \eq t + dt + T(\Delta_j) + \left( {\partial T \over \p... ...lta} {\partial \Delta \over \partial y} \right)_j \, dy - t_j \end{displaymath}$
Find equations to solve for dx, dy, and dt which minimize the sum-squared time residuals.
Gravity g_j has been measured at N irregularly spaced points on the surface of the earth (colatitude x_j, longitude y_j, j = 1, N). Show that the matrix of the normal equation which fits the data to spherical harmonics may be written as a sum of a column times its transpose, as in the preceding problem. How would the matrix simplify if there were infinitely many uniformly spaced data points? (NOTE: Spherical harmonics S are the class of functions
$\begin{displaymath} S^m_n (x, y) + P^m_n (\cos x) \exp (imy)\end{displaymath}$
for $(m = -n, \ldots , -1, 0, 1, \ldots , n)$ and $(n = 0, 1, \ldots , \infty )$ where P^m_n is an associated Legendre polynomial of degree n and order m.)
Ocean tides fit sinusoidal functions of known frequencies quite accurately. Associated with the tide is an earth tilt. A complex time series may be made from the north-south tilt plus $\sqrt{-1}$ times the east-west tilt. The observed complex time series may be fitted to an analytical form $\sum^N_{j= 1} A_j e^{i\omega} j^t$ . Find a set of equations which may be solved for the A_j which gives the best fit of the formula to the data. Show that some elements of the normal equation matrix are sums which may be summed analytically.
The general solution to Laplace's equation in cylindrical coordinates $(r, \theta)$ for a potential field P which vanishes at $r = \infty$ is given by
$\begin{displaymath} P\, (r, \theta) \eq {\rm Re} \sum^{\infty}_{m = 0} A_m {e^{im\theta} \over r^{m+1} } \end{displaymath}$
Find the potential field surrounding a square object at the origin which is at unit potential. Do this by finding N of the coefficients A_m by minimizing the squared difference between $P(r, \theta)$ and unity integrated around the square. Give the answer in terms of an inverse matrix of integrals. Which coefficients A_m vanish exactly by symmetry?

Next: WEIGHTS AND CONSTRAINTS Up: Data modeling by least Previous: Data modeling by least

Stanford Exploration Project
10/30/1997

	$\begin{eqnarray} V &= & {\bf c}^T{\bf c} - {\bf c}^T {\bf Ax} \\ {\bf 0} &= & -{\bf A}^T {\bf c} + {\bf A}^T {\bf Ax}\end{eqnarray}$	(17)
		(18)

	$\begin{eqnarray} 0 &= & {dE \over dx} \eq {\partial E \over \partial z} {dz \ov... ...er \partial z} - {\partial E \over \partial \overline{z}} \right)\end{eqnarray}$	(19)
		(20)

	$\begin{eqnarray} 0 &= & {\partial E \over \partial z} \\ 0 &= & {\partial E \over \partial \overline{z}}\end{eqnarray}$	(21)
		(22)