next up previous print clean

Paradox: large n vs. the ensemble average

Now for the paradox. Imagine $n \rightarrow \infty$ in Figure 8. Will we see the same limit as results from the ensemble average? Here are two contradictory points of view:

We will see that the first idea contains a false assumption. The autocorrelation does tend to an impulse, but the fuzz around the sides cannot be ignored--although the fuzz tends to zero amplitude, it also tends to infinite extent, and the product of zero with infinity here tends to have the same energy as the central impulse.

To examine this issue further, let us discover how these autocorrelations decrease to zero with n (the number of samples). Figure 9 shows the autocorrelation samples as a function of n in steps of n increasing by factors of four. Thus $\sqrt{n}$ increases by factors of two.

Figure 9
Autocorrelation as a function of number of data points. The random-noise-series (even) lengths are 60, 240, 960.


Each autocorrelation in the figure was normalized at zero lag. We see the sample variance for nonzero lags of the autocorrelation dropping off as $\sqrt{n}$.We also observe that the ratios between the values for the first nonzero lags and the value at lag zero roughly fit $1/\sqrt{n}$.Notice also that the fluctuations drop off with lag. The drop-off goes to zero at a lag equal to the sample length, because the number of terms in the autocorrelation diminishes to zero at that lag. A first impression is that the autocorrelation fits a triangular envelope. More careful inspection, however, shows that the triangle bulges upward at wide offsets, or large values of k (this is slightly clearer in Figure 8).

Let us explain all these observations. Each lag of the autocorrelation is defined as
s_k \eq \sum_{t=1}^{n-k}x_tx_{t+k} \end{displaymath} (36)
where (xt) is a sequence of zero-mean independent random variables. Thus, the expectations of the autocorrelations can be easily computed:
\E(s_0) &\eq& \sum_1^n\E(x_t^2) \eq n\sigma^2_x \ \E(s_k) &\eq...
 ...{n-k}\E(x_t)\E(x_{t+k}) \eq 0 \mbox{\hspace{0.5cm}(for $k\geq 1$)}\end{eqnarray} (37)
In Figure 9, the value at lag zero is more or less $n\sigma^2_x$(before normalization), the deviation being more or less the standard deviation (square root of the variance) of s0. On the other hand, for $k\geq 1$,as $\E(s_k)=0$, the value of the autocorrelation is directly the deviation of sk, i.e.,  something close to its standard deviation.

We now have to compute the variances of the sk. Let us write
s_k \eq
\mbox{\hspace{1.0cm}(where $y_k(t) = x_tx_{t+k}$)}\end{displaymath} (39)
So: $s_k=(n-k){\hat m_{y_k}}$, where ${\hat m_{y_k}}$ is the sample mean of yk with n-k terms. If $k\neq 0$, $\E(y_k)=0$, and we apply (33) to ${\hat m_{y_k}}$:
\E ({\hat m_{y_k}}^2) \eq {\sigma^2_{y_k} \over n-k}\end{displaymath} (40)
The computation of $\sigma^2_{y_k}$ is straightforward:
\sigma^2_{y_k} \eq \E(x^2_tx^2_{t+k}) \eq
\E(x_t^2)\E(x_{t+k}^2) \eq
\sigma^4_x \;, \end{displaymath} (41)
Thus, for the autocorrelation sk:
\E(s_k^2) \eq
(n-k)\sigma^2_{y_k} \eq
(n-k)\sigma^4_x \eq
{n-k\over n^2}(\E(s_0))^2\end{displaymath} (42)
Finally, as $\E(s_k)=0$, we get  
\sigma_{s_k} \eq \sqrt{E(s_k^2)} \eq \E(s_0) \ {\sqrt{n-k} \over n}\end{displaymath} (43)
This result explains the properties observed in Figure 9. As $n \rightarrow \infty$,all the nonzero lags tend to zero compared to the zero lag, since $\sqrt{n-k}/n$ tends to zero. Then, the first lags (k<<n) yield the ratio $1/\sqrt{n}$ between the autocorrelations and the value at lag zero. Finally, the autocorrelations do not decrease linearly with k, because of $\sqrt{n-k}$.

We can now explain the paradox. The energy of the nonzero lags will be
{\cal E} \eq
\sum_{k\neq 0}\E(s_k^2) \eq
{(\E(s_0))^2\over n^2}\sum_{k=1}^n(n-k) \eq
(\E(s_0))^2{n(n-1)\over n^2}\end{displaymath} (44)
Hence there is a conflict between the decrease to zero of the autocorrelations and the increasing number of nonzero lags, which themselves prevent the energy from decreasing to zero. The autocorrelation does not globally tend to an impulse function.

In the frequency domain, the spectrum $S(\omega)$ is now
S(\omega ) \eq {1\over n}(s_0 +s_1 \cos\omega + s_2 \cos 2 \omega + \cdots)\end{displaymath} (45)
So $\E[S(\omega)]=(1/n)\E[s_0]=\sigma_x^2$,and the average spectrum is a constant, independent of the frequency. However, as the sk fluctuate more or less like $\E[s_0]/\sqrt{n}$,and as their count in $S(\omega)$ is increasing with n, we will observe that $S(\omega)$ will also fluctuate, and indeed,
S(\omega ) \eq {1\over n}\E[s_0] \pm {1\over n}\E[s_0]
 \eq \sigma^2_x\pm\sigma^2_x\end{displaymath} (46)
This explains why the spectrum remains fuzzy: the fluctuation is independent of the number of samples, whereas the autocorrelation seems to tend to an impulse. In conclusion, the expectation (ensemble average) of the spectrum is not properly estimated by letting $n \rightarrow \infty$ in a sample.

next up previous print clean
Stanford Exploration Project