Next: Normal equations Up: MULTIVARIATE LEAST SQUARES Previous: MULTIVARIATE LEAST SQUARES

## Inside an abstract vector

In engineering uses, a vector has three scalar components that correspond to the three dimensions of the space in which we live. In least-squares data analysis, a vector is a one-dimensional array that can contain many different things. Such an array is an abstract vector.'' For example, in earthquake studies, the vector might contain the time an earthquake began, as well as its latitude, longitude, and depth. Alternatively, the abstract vector might contain as many components as there are seismometers, and each component might be the arrival time of an earthquake wave. Used in signal analysis, the vector might contain the values of a signal at successive instants in time or, alternatively, a collection of signals. These signals might be multiplexed'' (interlaced) or demultiplexed'' (all of each signal preceding the next). When used in image analysis, the one-dimensional array might contain an image, which could itself be thought of as an array of signals. Vectors, including abstract vectors, are usually denoted by boldface letters such as and .Like physical vectors, abstract vectors are orthogonal when their dot product vanishes: .Orthogonal vectors are well known in physical space; we will also encounter them in abstract vector space.

We consider first a hypothetical application with one data vector and two fitting vectors and .Each fitting vector is also known as a regressor." Our first task is to approximate the data vector by a scaled combination of the two regressor vectors. The scale factors x1 and x2 should be chosen so that the model matches the data; i.e.,
 (11)

Notice that we could take the partial derivative of the data in (11) with respect to an unknown, say x1, and the result is the regressor .

 The partial derivative of all theoretical data with respect to any model parameter gives a regressor. A regressor is a column in the matrix of partial-derivatives, .

The fitting goal (11) is often expressed in the more compact mathematical matrix notation ,but in our derivation here we will keep track of each component explicitly and use mathematical matrix notation to summarize the final result. Fitting the observed data to its two theoretical parts and can be expressed as minimizing the length of the residual vector , where
 (12) (13)
We use a dot product to construct a sum of squares (also called a quadratic form") of the components of the residual vector:
 (14) (15)
To find the gradient of the quadratic form Q(x1,x2), you might be tempted to expand out the dot product into all nine terms and then differentiate. It is less cluttered, however, to remember the product rule, that
 (16)
Thus, the gradient of Q(x1,x2) is defined by its two components:
 (17) (18)
Setting these derivatives to zero and using etc., we get
 (19) (20)
We can use these two equations to solve for the two unknowns x1 and x2. Writing this expression in matrix notation, we have
 (21)
It is customary to use matrix notation without dot products. To do this, we need some additional definitions. To clarify these definitions, we inspect vectors , , and of three components. Thus
 (22)
Likewise, the transposed matrix is defined by
 (23)
The matrix in equation (21) contains dot products. Matrix multiplication is an abstract way of representing the dot products:
 (24)
Thus, equation (21) without dot products is
 (25)
which has the matrix abbreviation
 (26)
Equation (26) is the classic result of least-squares fitting of data to a collection of regressors. Obviously, the same matrix form applies when there are more than two regressors and each vector has more than three components. Equation (26) leads to an analytic solution for using an inverse matrix. To solve formally for the unknown ,we premultiply by the inverse matrix :

 (27)
Equation (27) is the central result of least-squaresleast squares, central equation of theory. We see it everywhere.

In our first manipulation of matrix algebra, we move around some parentheses in (26):
 (28)
Moving the parentheses implies a regrouping of terms or a reordering of a computation. You can verify the validity of moving the parentheses if you write (28) in full as the set of two equations it represents. Equation (26) led to the analytic'' solution (27). In a later section on conjugate directions, we will see that equation (28) expresses better than (27) the philosophy of iterative methods.

Notice how equation (28) invites us to cancel the matrix from each side. We cannot do that of course, because is not a number, nor is it a square matrix with an inverse. If you really want to cancel the matrix , you may, but the equation is then only an approximation that restates our original goal (11):
 (29)

A speedy problem solver might ignore the mathematics covering the previous page, study his or her application until he or she is able to write the statement of goals goals, statement of (29) = (11), premultiply by ,replace by =, getting (26), and take (26) to a simultaneous equation-solving program to get .

What I call fitting goals'' are called regressions'' by statisticians. In common language the word regression means to trend toward a more primitive perfect state'' which vaguely resembles reducing the size of (energy in) the residual .Formally this is often written as:
 (30)
The notation above with two pairs of vertical lines looks like double absolute value, but we can understand it as a reminder to square and sum all the components. This formal notation is more explicit about what is constant and what is variable during the fitting.

Next: Normal equations Up: MULTIVARIATE LEAST SQUARES Previous: MULTIVARIATE LEAST SQUARES
Stanford Exploration Project
4/27/2004