next up previous print clean
Next: Inverse filter example Up: Model fitting by least Previous: Model fitting by least

MULTIVARIATE LEAST SQUARES

As described at the beginning of chapter [*], signals and images will be specified here by numbers packed into abstract vectors. We consider first a hypothetical application with one data vector ${\bf d}$ and two fitting vectors $\bold b_1$ and $\bold b_2$.Each fitting vector is also known as a ``regressor." Our first task is to try to approximate the data vector ${\bf d}$by a scaled combination of the two regressor vectors. The scale factors x1 and x2 should be chosen so that the model matches the data, i.e.,  
 \begin{displaymath}
{\bf d}\quad \approx \quad \bold b_1 x_1 + \bold b_2 x_2\end{displaymath} (1)
For example, if I print the characters ``$\bold P$'' and ``$\bold b$'' on top of each other, I get ``$\bold P \!\!\!\! \bold b$,'' which looks something like an image of the letter ``$\bold B$.'' This is analogous to $ \bold d \approx \bold b_1+\bold b_2 $.More realistically, $\bold d$ could contain a sawtooth function of time, and $\bold b_1$and $\bold b_2$could be sinusoids. Still more realistically, $\bold d$ could be an observed 2-D wave field, and $\bold b_1$ and $\bold b_2$could be theoretical data in two parts, where the contribution of each part is to be learned by fitting. (One part could be primary reflections and the other multiple reflections.)

Notice that we could take the partial derivative of the data in (1) with respect to an unknown, say x1, and the result is the regressor $\bold b_1$.

The partial derivative of all data with respect to any model parameter gives a regressor. A regressor is a column in the partial-derivative matrix.

Equation (1) is often expressed in the more compact mathematical matrix notation ${\bf d}\approx \bf B \bf x $,but in our derivation here we will keep track of each component explicitly and use mathematical matrix notation to summarize the final result. Fitting the data $\bold d$ to its two theoretical components can be expressed as minimizing the length of the residual vector $\bold r$, where  
 \begin{displaymath}
\bold r \eq {\bf d}- \bold b_1 x_1 - \bold b_2 x_2\end{displaymath} (2)
So we construct a sum of squares (also called a ``quadratic form") of the components of the residual vector by using a dot product:
\begin{eqnarray}
Q(x_1,x_2) &=& \bold r \cdot \bold r \  &=&({\bf d}- {\bf b}_1 x_1 - {\bf b}_2 x_2) \cdot ({\bf d}- {\bf b}_1 x_1 - {\bf b}_2 x_2) \end{eqnarray} (3)
(4)
The gradient of Q(x1,x2)/2 is defined by its two components:

\begin{eqnarray}
{\partial Q \over \partial x_1} &= &
 - {\bf b}_1 \cdot ({\bf d...
 ... x_2) 
 - ({\bf d}- {\bf b}_1 x_1 - {\bf b}_2 x_2) \cdot {\bf b}_2\end{eqnarray} (5)
(6)
Setting these derivatives to zero and using $({\bf b}_1 \cdot {\bf b}_2)=({\bf b}_2 \cdot {\bf b}_1)$ etc., we get
\begin{eqnarray}
({\bf b}_1 \cdot {\bf d}) &= & ({\bf b}_1 \cdot {\bf b}_1) x_1 ...
 ... ({\bf b}_2 \cdot {\bf b}_1) x_1 + ({\bf b}_2 \cdot {\bf b}_2) x_2\end{eqnarray} (7)
(8)
which two equations we can use to solve for the two unknowns x1 and x2. Writing this expression in matrix notation, we have  
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 ({\bf b}_1 \cdot {\bf d}) \  
 (...
 ...\; \left[ 
\begin{array}
{c}
 x_1 \  
 x_2 \end{array} \right]\end{displaymath} (9)
It is customary to use matrix notation without dot products. For this we need some additional definitions. To clarify these definitions, I choose the number of components in the vectors ${\bf b}_1$, ${\bf b}_2$, and ${\bf d}$ to be three. Thus I can explicitly write a matrix boldB in full as
\begin{displaymath}
\bold B \eq [ {\bf b}_1 \quad {\bf b}_2 ] \eq 
\left[ 
\begi...
 ...2} \  b_{21} & b_{22} \  b_{31} & b_{32} \end{array} \right] \end{displaymath} (10)
Likewise, the transposed matrix $\bold B'$ is defined by
\begin{displaymath}
\bold B' \eq
\left[ 
\begin{array}
{ccc}
 b_{11} & b_{21} & b_{31} \  b_{12} & b_{22} & b_{32} \end{array} \right] \end{displaymath} (11)
The matrix in equation (9) contains dot products. Matrix multiplication is an abstract way of representing the dot products:  
 \begin{displaymath}
\left[ 
\begin{array}
{ccc}
 ({\bf b}_1 \cdot {\bf b}_1) & (...
 ...12} \  b_{21} & b_{22} \  b_{31} & b_{32} \end{array} \right]\end{displaymath} (12)
Thus, equation (9) without dot products is
\begin{displaymath}
\left[ 
\begin{array}
{ccc}
 b_{11} & b_{21} & b_{31} \  b_...
 ... 
\left[ 
\begin{array}
{ccc}
 x_1 \  x_2 \end{array} \right] \end{displaymath} (13)
which has the matrix abbreviation  
 \begin{displaymath}
\bf B' {\bf d}\eq ( \bf B' \; \bf B ) \bf x\end{displaymath} (14)
Equation (14) is the classic result of least-squares fitting of data to a collection of regressors. Obviously, the same matrix form applies when there are more than two regressors and each vector has more than three components. Equation (14) leads to an analytic solution for $\bf x$using an inverse matrix. To solve formally for the unknown $\bold x$,we premultiply by the inverse matrix $( \bf B' \; \bf B )^{-1}$:

 
 \begin{displaymath}
\bf x \eq
( \bf B' \; \bf B )^{-1} \;
\bf B' {\bf d}\end{displaymath} (15)
Equation (15) is the central result of least-squares analysis. We see it everywhere.

Equation (12) is an example of what is called a ``covariance matrix.'' Such matrices usually need to be inverted, and in equation (15) you already see an example of the occurrence of an inverse covariance matrix. Any description of an application of least-squares fitting will generally include some discussion of the covariance matrix--how it will be computed, assumed, or estimated, and how its inverse will be found or approximated. In chapter [*] we found the need to weight residuals by the inverse of their scale. That was our first example of the occurrence of an inverse covariance matrix--although in that case the matrix size was only $1\times 1$.

In our first manipulation of matrix algebra, we move around some parentheses in (14):  
 \begin{displaymath}
\bf B' {\bf d}\eq \bf B' \; (\bf B \bf x )\end{displaymath} (16)
Moving the parentheses implies a regrouping of terms or a reordering of a computation. You can verify the validity of moving the parentheses by writing (16) in full as the set of two equations it represents. Equation (14) led to the ``analytic'' solution (15). In a later section on conjugate gradients, we will see that equation (16) expresses better than (15) the philosophy of computation.

Notice how equation (16) invites us to cancel the matrix $\bf B'$from each side. We cannot do that of course, because $\bf B'$is not a number, nor is it a square matrix with an inverse. If you really want to cancel the matrix $\bf B'$, you may, but the equation is then only an approximation that restates our original goal (1):  
 \begin{displaymath}
{\bf d}\quad \approx \quad \bf B \bf x\end{displaymath} (17)

A speedy problem solver might ignore the mathematics covering the previous page, study his or her application until he or she is able to write the statement of wishes (17) = (1), premultiply by $\bf B'$,replace $\approx$ by =, getting (14), and take (14) to a simultaneous equation-solving program to get $\bold x$.

The formal literature does not speak of ``statement of wishes'' but of ``regression," which is the same concept. In a regression, there is an abstract vector called the residual $\bold r = \bold d - \bold B \bold x $whose components should all be small. Formally this is often written as:
\begin{displaymath}
\min_{\bold x} \ \ \vert\vert \bold d - \bold B \bold x \vert\vert \end{displaymath} (18)
The notation above with two pairs of vertical lines looks like double absolute value, but we can understand it as a reminder to square and sum all the components. This notation is more explicit about what is being minimized, but I often find myself sketching out applications in the form of a ``statement of wishes,'' which I call a ``regression.''



 
next up previous print clean
Next: Inverse filter example Up: Model fitting by least Previous: Model fitting by least
Stanford Exploration Project
10/21/1998