Next: Normal equations
Up: MULTIVARIATE LEAST SQUARES
Previous: MULTIVARIATE LEAST SQUARES
In engineering uses,
a vector has three scalar components that
correspond to the three dimensions of the space in which we live.
In least-squares data analysis, a vector is a one-dimensional array
that can contain many different things.
Such an array is an ``abstract vector.''
For example, in earthquake studies,
the vector might contain the time
an earthquake began, as well as its latitude, longitude, and depth.
Alternatively, the abstract vector
might contain as many components as there are seismometers,
and each component might be the arrival time of an earthquake wave.
Used in signal analysis,
the vector might contain the values of a signal
at successive instants in time or,
alternatively, a collection of signals.
These signals might be ``multiplexed'' (interlaced)
or ``demultiplexed'' (all of each signal preceding the next).
When used in image analysis,
the one-dimensional array might contain an image,
which could itself be thought of as an array of signals.
Vectors, including abstract vectors,
are usually denoted by boldface letters such as and .Like physical vectors,
abstract vectors are orthogonal
when their dot product vanishes: .Orthogonal vectors are well known in physical space;
we will also encounter them in abstract vector space.
We consider first a hypothetical application
with one data vector and two
fitting vectors and .Each fitting vector is also known as a ``regressor."
Our first task is to approximate the data vector by a scaled combination of the two regressor vectors.
The scale factors x1 and x2
should be chosen so that the model matches the data; i.e.,
| |
(11) |
Notice that we could take the partial derivative
of the data in (11) with respect to an unknown,
say x1,
and the result is the regressor .
The partial derivative of all theoretical data
with respect to any model parameter
gives a regressor.
A regressor is a column in the
matrix of partial-derivatives,
. |
The fitting goal (11) is often expressed in the more compact
mathematical matrix notation ,but in our derivation here
we will keep track of each component explicitly
and use mathematical matrix notation to summarize the final result.
Fitting the observed data to its two theoretical parts
and can be expressed
as minimizing the length of the residual vector , where
| |
(12) |
| (13) |
We use a dot product to construct a sum of squares (also called a ``quadratic form")
of the components of the residual vector:
| |
(14) |
| (15) |
To find the gradient of the quadratic form Q(x1,x2),
you might be tempted to expand out the dot product into all nine terms
and then differentiate.
It is less cluttered, however, to remember the product rule, that
| |
(16) |
Thus, the gradient of Q(x1,x2) is defined by its two components:
| |
(17) |
| (18) |
Setting these derivatives to zero and using
etc.,
we get
| |
(19) |
| (20) |
We can use these two equations to solve for
the two unknowns x1 and x2.
Writing this expression in matrix notation, we have
| |
(21) |
It is customary to use matrix notation without dot products.
To do this, we need some additional definitions.
To clarify these definitions,
we inspect vectors
, , and of three components.
Thus
| |
(22) |
Likewise, the transposed matrix is defined by
| |
(23) |
The matrix in equation (21)
contains dot products.
Matrix multiplication is an abstract way of representing the dot products:
| |
(24) |
Thus, equation (21) without dot products is
| |
(25) |
which has the matrix abbreviation
| |
(26) |
Equation
(26)
is the classic result of least-squares
fitting of data to a collection of regressors.
Obviously, the same matrix form applies when there are more than
two regressors and each vector has more than three components.
Equation
(26)
leads to an analytic solution for using an inverse matrix.
To solve formally for the unknown ,we premultiply by the inverse matrix :
| |
(27) |
Equation (27) is the central result of
least-squaresleast squares, central equation of
theory.
We see it everywhere.
|
In our first manipulation of matrix algebra,
we move around some parentheses in
(26):
| |
(28) |
Moving the parentheses implies a regrouping of terms
or a reordering of a computation.
You can verify the validity of moving the parentheses
if you write (28) in full as the set of two equations it represents.
Equation
(26)
led to the ``analytic'' solution (27).
In a later section on conjugate directions,
we will see that equation
(28)
expresses better than
(27)
the philosophy of iterative methods.
Notice how equation
(28)
invites us to cancel the matrix
from each side.
We cannot do that of course, because
is not a number, nor is it a square matrix with an inverse.
If you really want to cancel the matrix , you may,
but the equation is then only an approximation
that restates our original goal (11):
| |
(29) |
A speedy problem solver might
ignore the mathematics covering the previous page,
study his or her application until he or she
is able to write the statement of goals
goals, statement of
(29) = (11),
premultiply by ,replace by =,
getting (26),
and take
(26)
to a simultaneous equation-solving program to get .
What I call ``fitting goals'' are called
``regressions'' by statisticians.
In common language the word regression means
to ``trend toward a more primitive perfect state''
which vaguely resembles reducing the size of (energy in)
the residual .Formally this is often written as:
| |
(30) |
The notation above with two pairs of vertical lines
looks like double absolute value,
but we can understand it as a reminder to square and sum all the components.
This formal notation is more explicit
about what is constant and what is variable during the fitting.
Next: Normal equations
Up: MULTIVARIATE LEAST SQUARES
Previous: MULTIVARIATE LEAST SQUARES
Stanford Exploration Project
4/27/2004