next up previous print clean
Next: Normal equations Up: MULTIVARIATE LEAST SQUARES Previous: MULTIVARIATE LEAST SQUARES

Inside an abstract vector

In engineering uses, a vector has three scalar components that correspond to the three dimensions of the space in which we live. In least-squares data analysis, a vector is a one-dimensional array that can contain many different things. Such an array is an ``abstract vector.'' For example, in earthquake studies, the vector might contain the time an earthquake began, as well as its latitude, longitude, and depth. Alternatively, the abstract vector might contain as many components as there are seismometers, and each component might be the arrival time of an earthquake wave. Used in signal analysis, the vector might contain the values of a signal at successive instants in time or, alternatively, a collection of signals. These signals might be ``multiplexed'' (interlaced) or ``demultiplexed'' (all of each signal preceding the next). When used in image analysis, the one-dimensional array might contain an image, which could itself be thought of as an array of signals. Vectors, including abstract vectors, are usually denoted by boldface letters such as $\bold p$ and $\bold s$.Like physical vectors, abstract vectors are orthogonal when their dot product vanishes: $\bold p \cdot \bold s =0$.Orthogonal vectors are well known in physical space; we will also encounter them in abstract vector space.

We consider first a hypothetical application with one data vector ${\bf d}$ and two fitting vectors $\bold f_1$ and $\bold f_2$.Each fitting vector is also known as a ``regressor." Our first task is to approximate the data vector ${\bf d}$by a scaled combination of the two regressor vectors. The scale factors x1 and x2 should be chosen so that the model matches the data; i.e.,  
 \begin{displaymath}
{\bf d}\quad \approx \quad \bold f_1 x_1 + \bold f_2 x_2\end{displaymath} (11)

Notice that we could take the partial derivative of the data in (11) with respect to an unknown, say x1, and the result is the regressor $\bold f_1$.

The partial derivative of all theoretical data with respect to any model parameter gives a regressor. A regressor is a column in the matrix of partial-derivatives, $\partial d_i /\partial m_j$.

The fitting goal (11) is often expressed in the more compact mathematical matrix notation ${\bf d}\approx \bold F \bold x $,but in our derivation here we will keep track of each component explicitly and use mathematical matrix notation to summarize the final result. Fitting the observed data $\bold d = \bold d^{\rm obs}$to its two theoretical parts $\bold f_1x_1$ and $\bold f_2x_2$can be expressed as minimizing the length of the residual vector $\bold r$, where
   \begin{eqnarray}
\bold 0 \quad\approx\quad
 \bold r &=& \bold d^{\rm theor} - \b...
 ...pprox\quad
 \bold r &=& \bold f_1 x_1 + \bold f_2 x_2 \ -\ {\bf d}\end{eqnarray} (12)
(13)
We use a dot product to construct a sum of squares (also called a ``quadratic form") of the components of the residual vector:
\begin{eqnarray}
Q(x_1,x_2) &=& \bold r \cdot \bold r \\  &=&
 ({\bf f}_1 x_1 + ...
 ...2 x_2 - {\bf d})
 \cdot
 ({\bf f}_1 x_1 + {\bf f}_2 x_2 - {\bf d})\end{eqnarray} (14)
(15)
To find the gradient of the quadratic form Q(x1,x2), you might be tempted to expand out the dot product into all nine terms and then differentiate. It is less cluttered, however, to remember the product rule, that
\begin{displaymath}
{d\over dx} \bold r \cdot \bold r
\eq
{d\bold r \over dx} \cdot \bold r
+
\bold r
\cdot
{d\bold r \over dx}\end{displaymath} (16)
Thus, the gradient of Q(x1,x2) is defined by its two components:
\begin{eqnarray}
{\partial Q \over \partial x_1} &= &
 {\bf f}_1 \cdot ({\bf f}_...
 ...f d}) 
 + ({\bf f}_1 x_1 + {\bf f}_2 x_2 -{\bf d}) \cdot {\bf f}_2\end{eqnarray} (17)
(18)
Setting these derivatives to zero and using $({\bf f}_1 \cdot {\bf f}_2)=({\bf f}_2 \cdot {\bf f}_1)$ etc., we get
\begin{eqnarray}
({\bf f}_1 \cdot {\bf d}) &= & ({\bf f}_1 \cdot {\bf f}_1) x_1 ...
 ... ({\bf f}_2 \cdot {\bf f}_1) x_1 + ({\bf f}_2 \cdot {\bf f}_2) x_2\end{eqnarray} (19)
(20)
We can use these two equations to solve for the two unknowns x1 and x2. Writing this expression in matrix notation, we have  
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 ({\bf f}_1 \cdot {\bf d}) \\  
 (...
 ...\; \left[ 
\begin{array}
{c}
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (21)
It is customary to use matrix notation without dot products. To do this, we need some additional definitions. To clarify these definitions, we inspect vectors ${\bf f}_1$, ${\bf f}_2$, and ${\bf d}$ of three components. Thus
\begin{displaymath}
\bold F \eq [ {\bf f}_1 \quad {\bf f}_2 ] \eq 
\left[ 
\begi...
 ...2} \\  f_{21} & f_{22} \\  f_{31} & f_{32} \end{array} \right] \end{displaymath} (22)
Likewise, the transposed matrix $\bold F'$ is defined by
\begin{displaymath}
\bold F' \eq
\left[ 
\begin{array}
{ccc}
 f_{11} & f_{21} & f_{31} \\  f_{12} & f_{22} & f_{32} \end{array} \right] \end{displaymath} (23)
The matrix in equation (21) contains dot products. Matrix multiplication is an abstract way of representing the dot products:  
 \begin{displaymath}
\left[ 
\begin{array}
{ccc}
 ({\bf f}_1 \cdot {\bf f}_1) & (...
 ...12} \\  f_{21} & f_{22} \\  f_{31} & f_{32} \end{array} \right]\end{displaymath} (24)
Thus, equation (21) without dot products is
\begin{displaymath}
\left[ 
\begin{array}
{ccc}
 f_{11} & f_{21} & f_{31} \\  f_...
 ... 
\left[ 
\begin{array}
{ccc}
 x_1 \\  x_2 \end{array} \right] \end{displaymath} (25)
which has the matrix abbreviation  
 \begin{displaymath}
\bold F' {\bf d}\eq ( \bold F' \; \bold F ) \bold x\end{displaymath} (26)
Equation (26) is the classic result of least-squares fitting of data to a collection of regressors. Obviously, the same matrix form applies when there are more than two regressors and each vector has more than three components. Equation (26) leads to an analytic solution for $\bold x$using an inverse matrix. To solve formally for the unknown $\bold x$,we premultiply by the inverse matrix $( \bold F' \; \bold F )^{-1}$:

 
 \begin{displaymath}
\bold x \eq
( \bold F' \; \bold F )^{-1} \;
\bold F' {\bf d}\end{displaymath} (27)
Equation (27) is the central result of least-squaresleast squares, central equation of theory. We see it everywhere.

In our first manipulation of matrix algebra, we move around some parentheses in (26):  
 \begin{displaymath}
\bold F' {\bf d}\eq \bold F' \; (\bold F \bold x )\end{displaymath} (28)
Moving the parentheses implies a regrouping of terms or a reordering of a computation. You can verify the validity of moving the parentheses if you write (28) in full as the set of two equations it represents. Equation (26) led to the ``analytic'' solution (27). In a later section on conjugate directions, we will see that equation (28) expresses better than (27) the philosophy of iterative methods.

Notice how equation (28) invites us to cancel the matrix $\bold F'$from each side. We cannot do that of course, because $\bold F'$is not a number, nor is it a square matrix with an inverse. If you really want to cancel the matrix $\bold F'$, you may, but the equation is then only an approximation that restates our original goal (11):  
 \begin{displaymath}
{\bf d}\quad \approx \quad \bold F \bold x\end{displaymath} (29)

A speedy problem solver might ignore the mathematics covering the previous page, study his or her application until he or she is able to write the statement of goals goals, statement of (29) = (11), premultiply by $\bold F'$,replace $\approx$ by =, getting (26), and take (26) to a simultaneous equation-solving program to get $\bold x$.

What I call ``fitting goals'' are called ``regressions'' by statisticians. In common language the word regression means to ``trend toward a more primitive perfect state'' which vaguely resembles reducing the size of (energy in) the residual $\bold r = \bold F \bold x - \bold d $.Formally this is often written as:  
 \begin{displaymath}
\min_{\bold x} \ \ \vert\vert \bold F \bold x - \bold d \vert\vert\end{displaymath} (30)
The notation above with two pairs of vertical lines looks like double absolute value, but we can understand it as a reminder to square and sum all the components. This formal notation is more explicit about what is constant and what is variable during the fitting.


next up previous print clean
Next: Normal equations Up: MULTIVARIATE LEAST SQUARES Previous: MULTIVARIATE LEAST SQUARES
Stanford Exploration Project
4/27/2004