previous up next print clean


If you flipped a coin 100 times, it is possible that you would get exactly 50 ``heads'' and 50 ``tails''. More likely it would be something between 60-40 and 40-60. Typically, how much deviation from 50 would you expect to see? The average (mean) value should be 50, but some other value is almost always obtained from a random sample. The other value is called the sample mean. We would like to know how much difference to expect between the sample mean and the true mean. The average squared difference is called the variance of the sample mean. For a very large sample, the sample mean should be proportionately much closer to the true mean than for a smaller sample. This idea will lead to an uncertainty relation between the probable error in the estimated mean and the size of the sample. Let us be more precise.

The ``true value'' of the mean could be defined by flipping the coin n times and conceiving of n going to infinity. A more convenient definition of ``true value'' is that the experiment could be conceived of as having been done separately under identical conditions by an infinite number of people (an ensemble). Such an artifice will enable us to define a time-variable mean for coins which change with time.

The utility of the concept of an ensemble is often subjected to serious attack both from the point of view of the theoretical foundations of statistics and from the point of view of experimentalists applying the techniques of statistics. Nonetheless a great body of geophysical literature uses the artifice of assuming the existence of an unobservable ensemble. The advocates of using ensembles (the Gibbsians) have the advantage over their adversaries (the Bayesians) in that their mathematics is more tractable (and more explainable). So, let us begin!

A conceptual average over the ensemble, called an expectation, is denoted by the symbol E. The index for summation over the ensemble is never shown explicitly; every random variable is presumed to have one. Thus, the true mean at time t may be defined as  
m_t \eq E(x_t)\end{displaymath} (13)
If the mean does not vary with time, we may write  
m \eq E(x_t) \qquad ({\rm all \ } t)\end{displaymath} (14)

Likewise, we may be interested in a property of xt called its variance which is a measure of variability about the mean defined by  
\sigma^2_t \eq E\, [(x_t - m_t)^2]\end{displaymath} (15)
The xt random numbers could be defined in such a way that $\sigma$ or m or both is either time-variable or constant. If both are constant, we have  
\sigma^2 \eq E\, [(x_t - m)^2]\end{displaymath} (16)
When manipulating algebraic expressions the symbol E behaves like a summation sign, namely  
E \eq (\lim n \rightarrow \infty) {1 \over n} \sum^n_1\end{displaymath} (17)
Notice that the summation index is not given, since the sum is over the ensemble, not time.

Now let xt be a time series made up from (identically distributed, independently chosen) random numbers in such a way that m and $\sigma$ do not depend on time. Suppose we have a sample of n points of xt and are trying to determine the value of m. We could make an estimate $\hat{m}$ of the mean m with the formula  
\hat{m} \eq {1 \over n} \sum^n_{t = 1} x_t\end{displaymath} (18)
A somewhat more elaborate method of estimating the mean would be to take a weighted average. Let wt define a set of weights normalized so that  
\sum w_t \eq 1\end{displaymath} (19)
With these weights the more elaborate estimate $\hat{m}$ of the mean is  
\hat{m} \eq\sum w_t x_t\end{displaymath} (20)
Actually (18) is just a special case of (20) where the weights are wt = 1/n; $t = 1, 2, \ldots , n$.

Our objective in this section is to determine how far the estimated mean $\hat{m}$ is likely to be from the true mean m for a sample of length n. One possible definition of this excursion $\Delta m$ is
(\Delta m)^2 &= & E\, [(\hat{m} - m)^2]
\\  &= & E\, \left\{ [(\sum w_t x_t) - m]^2 \right\}\end{eqnarray} (21)
Now utilize the fact that $m = m\sum w_t = \sum w_t m$
(\Delta m)^2 
&= &E\, \left\{ \left[\sum_t w_t (x_t - m) \right...
 ...\\ &= &E\, \left[\sum_t \sum_s w_t w_s (x_t - m)(x_s - m) \right] \end{eqnarray} (23)
Now the expectation symbol E may be regarded as a summation sign and brought inside the sums on t and s.  
(\Delta m)^2 \eq\sum_t \sum_s w_t w_s\, E\, \left[ (x_t - m)(x_s - m) \right] \end{displaymath} (26)
By the randomness of xt and xs the expectation on the right, that is, the sum over the ensemble, gives zero unless s = t. If s = t, then the expectation is the variance defined by (16). Thus we have
(\Delta m)^2 &= & \sum_t \sum_s w_t w_s \sigma^2 \sigma_{ts}
\\  &= & \sum_t w^2_t \sigma^2\end{eqnarray} (27)
\Delta m \eq \sigma \left( \sum_t w^2_2\right)^{1/2}\end{displaymath} (29)
Now let us examine this final result for n weights each of size 1/n. For this case, we get  
\Delta m \eq \sigma \left[ \sum^n_1 \left( {1 \over n}\right)^2 \right]^{1/2}
 \eq {\sigma \over (n)^{1/2}}\end{displaymath} (30)
This is the most important property of random numbers which is not intuitively obvious. For a zero mean situation it may be expressed in words:   ``n random numbers of unit magnitude add up to a magnitude of about the square root of n.''

When one is trying to estimate the mean of a random series which has a time-variable mean, one faces a basic dilemma. If one includes a lot of numbers in the sum to get $\Delta m$ small, then m may be changing while one is trying to measure it. In contrast, $\hat{m}$ measured from a short sample of the series might deviate greatly from the true m (defined by an infinite sum over the ensemble at any point in time). This is the basic dilemma faced by a stockbroker when a client tells him, ``Since the market fluctuates a lot I'd like you to sell my stock sometime when the price is above the mean selling price.''

If we imagine that a time series is sampled every $\tau$ seconds and we let $\Delta t = n\tau$ denote the length of the sample then (30) may be written as  
(\Delta m)^2 \, \Delta t \eq \sigma^2 t\end{displaymath} (31)
It is clearly desirable to have both $\Delta m$ and $\Delta t$ as small as possible. If the original random numbers xt were correlated with one another, for example, if xt were an approximation to a continuous function, then the sum of the n numbers would not cancel to root n. This is expressed by the inequality  
(\Delta m)^2 \, \Delta t \quad\geq\quad \sigma^2 t\end{displaymath} (32)
The inequality (32) may be called an uncertainty relation between accuracy and time resolution.

In considering other sets of weights one may take a definition of $\Delta t$which is more physically sensible than $\tau$ times the number of weights. For example, if the weights wt are given by a sampled gaussian function as shown in Figure 2 then $\Delta t$ could be taken as the separation of half-amplitude points, 1/e points, the time span which includes 95 percent of the area, or it could be given many other ``sensible'' interpretations. Given a little slop in the definition of $\Delta m$ and $\Delta t$,it is clear that the inequality of (32) is not to be strictly applied.

Figure 2
Binomial coefficients tend to the gaussian function. Plotted are the coefficients of Zt in (.5 + .5Z)20.

Given a sample of a zero mean random time series xt, we may define another series yt by yt = x2t. The problem of estimating the variance $\sigma^2 = p$ of xt is identical to the problem of estimating the mean m of yt. If the sample is short, we may expect an error $\Delta p$ in our estimate of the variance. Thus, in a scientific paper one would like to write for the mean
m &= & \hat{m} \pm \Delta m
\\  &= & \hat{m} \pm \sigma/\sqrt{n}\end{eqnarray} (33)
but since the variance $\sigma^2$ often is not known either, it is necessary to use the estimated $\hat{\sigma}$, that is  
m \eq \hat{m} \pm \sigma/\sqrt{n}\end{displaymath} (35)

Of course (35) really is not right because we really should add something to indicate additional uncertainty due to error in $\hat{\sigma}$. This estimated error would again have an error, ad infinitum. To really express the result properly, it is necessary to have a probability density function to calculate all the E(xn) which are required. The probability function can be either estimated from the data or chosen theoretically. In practice, for a reason given in a later section, the gaussian function often occurs. In the exercise it is shown that  
\Delta p \eq p \sqrt{2 \over n}\end{displaymath} (36)
Since $\Delta t = n\tau$, by squaring we have  
\left( {\Delta p \over p} \right)^2 {\Delta t \over 2\tau} \quad\geq\quad 1\end{displaymath} (37)
The inequality applies if the random numbers xt are not totally unpredictable random numbers. If xt is an approximation to a continuous function, then it is highly predictable and there will be a lot of slack in the inequality.

Correlation is a concept similar to cosine. A cosine measures the angle between two vectors. It is given by the dot product of the two vectors divided by their magnitudes

c \eq { ({\bf x} \cdot {\bf y}) \over [({\bf x} \cdot {\bf x}) ({\bf y} \cdot
{\bf y})]^{1/2} }\end{displaymath}

Correlation is the same sort of thing, except x and y are scalar random variables, so instead of having a vector subscript their subscript is the implicit ensemble subscript. Correlation is defined

c \eq {E(xy) \over [E(x^2)\, E(y^2)]^{1/2} }\end{displaymath}

In practice one never has an ensemble. There is a practical problem when the ensemble average is simulated by averaging over a sample. The problem arises with small samples and is most dramatically illustrated for a sample with only one element. Then the sample correlation is

\hat{c} \eq {xy \over \vert x\vert \, \vert y\vert} \eq \pm 1\end{displaymath}

regardless of what value the random number x or the random number y should take. In fact, it turns out that the sample correlation $\hat{c}$ will always scatter away from zero.

No doubt this accounts for many false ``discoveries''. The topic of bias and variance of coherency estimates is a complicated one, but a rule of thumb seems to be to expect bias and variance of $\hat{c}$ on the order of $1/\sqrt{n}$ for samples of size n.


  1. Suppose the mean of a sample of random numbers is estimated by a triangle weighting function, i.e.,

\hat{m} \eq s \sum^n_{t = 0} (n - i)\, x_i \end{displaymath}

    Find the scale factor s so that $E(\hat{m}) = m$. Calculate $\Delta m$.Define a reasonable $\Delta t$. Examine the uncertainty relation.
  2. A random series xt with a possibly time-variable mean may have the mean estimated by the feedback equation

\hat{m}_t \eq (1 - \epsilon) \hat{m}_{t-1} + bx_t\end{displaymath}

    Express $\hat{m}_t$ as a function of $x_t, x_{t-1}, \ldots ,$and not $\hat{m}_{t - 1}$.
    What is $\Delta t$, the effective averaging time?

    Find the scale factor b so that if mt = m, then $E(\hat{m}_t) = m.$

    Compute the random error $\Delta m = [E(\hat{m} - m)^2]^{1/2}$[answer goes to $\sigma (\epsilon/2)^{1/2}$ as $\epsilon$ goes to zero].

    What is $(\Delta m)^2 \Delta t$ in this case?

  3. Show that

(\Delta P)^2 \eq {1 \over n} [E(x^4) - \sigma^4]\end{displaymath}

  4. Define the behavior of an independent zero-mean-tie series xt by defining the probabilities that various amplitudes will be attained. Calculate E(xi), E(x2i), $(\Delta P)^2$.If you have taken a course in probability theory, use a gaussian probability density function for xi. HINT:  

P(x) \eq {1 \over \sigma \sqrt{2\pi} } \, e^{-x^2/2\sigma^2} \end{displaymath}


\int^{\infty}_0 x^{2n} e^{-ax^2}\, dx \eq {1 \cdot 3 \cdot 5 \cdots (2n - 1)
 \over 2^{n+1} a^n} \, \sqrt{\pi \over a} \end{displaymath}

previous up next print clean
Stanford Exploration Project