next up previous [pdf]

Next: Examples Up: Choosing the Operator Length Previous: The common criteria

An information theory criterion

Suppose we have found an estimate of the prediction error filter of length $ M$ using the autocorrelation estimates $ R_0, \ldots, R_{M-1}$. In order to increase the operator length to $ M+1$, additional information is needed: namely, $ R_M$. A quantitative measure of the information in the operator is easily obtained from the average entropy, which we know is given by $ h_m^\prime = \frac{1}{2}\ln E_M$. Using (31), notice that

$\displaystyle h_{M+1}^\prime = \frac{1}{2}\ln E_M + \frac{1}{2}\ln\left(1-\vert C_{M+1}\vert^2\right) \le h_M^\prime.$ (A-37)

Thus, the entropy decreases as the operator length increases. The bound information (Brillouin, 1956) $ I_M$ in the power spectrum is therefore given by

$\displaystyle I_M^\prime = - h_M^\prime = -\frac{1}{2} \ln E_M,$ (A-38)

which obviously [since, from (37), we have $ -h_M^\prime \le -h_{M+1}^\prime$] increases monotonically with $ M$ as it should.

If the autocorrelation values $ R_0, \ldots, R_{n-1}$ were known precisely, bound information would continue to increase by using all the estimates and letting $ M \to N - 1$. But the $ R$'s are not precisely known. The finite number of measurements used to compute the $ R$ estimates means that only $ N-n$ measurements of $ R_n$ were made, whereas $ N$ measurements of $ R_0$ were made. The quality of information contained in $ R_0$ is correspondingly higher than that in $ R_n$. A quantitative measure of this change is therefore required.

For the moment, take Equation (19) as our estimate of the autocorrelation. Then, assuming that the $ X_i$'s are normally distributed, () shows that

$\displaystyle Var (X_i^*X_{i+n}-R_n) = R_0^2 + R_n^2.$ (A-39)

Since $ \vert R_0\vert \ge \vert R_n\vert$, for all $ n$, and generally $ \vert R_0\vert >> \vert R_n\vert$, for large $ n$, the variance (39) can be approximated by the constant $ R_0^2$. Viewing $ R_n$ as a measured quantity (which in fact it usually is not) and using standard arguments from measurement theory, we find that

$\displaystyle R_n = \frac{1}{N-n}\sum_mX_m^*X_{m+n} \pm 0.67\frac{R_0}{\sqrt{N-n}},$ (A-40)

with fifty percent confidence if $ R_n$ is also normally distributed.

The probable error in $ R_n$ increases like $ (N-n)^{-1/2}$ as $ n \to N-1$. We imagine that the factor $ (N-n)^{-1/2}$ is proportional to the probability $ p_n$ that an operator computed from $ R_0,\ldots, R_N$ is a worse estimate of the true opertor than was the operator computed using only $ R_0, \ldots, R_{n-1}$. Since we know empirically that the estimate worsens as $ M \to N$ with probability one, the $ P(n)$ are normalized by writing:

$\displaystyle P(n) = \alpha (N-n)^{-\frac{1}{2}}$ (A-41)

and

$\displaystyle 1 = \sum_{n=0}^{N-1} P_n \simeq \alpha\int_0^N(N-n)^{-\frac{1}{2}} dn = 2\sqrt{N} \alpha.$ (A-42)

Equation (42) determines the value of $ \alpha$, for the data that are available.

The average entropy of measurement error associated with an operator of length $ M$ is

\begin{displaymath}\begin{array}{lr} h_M^{\prime\prime} & = - \sum_{n=0}^{M-1} P(n)\ln P(n) \cr & \simeq - \int_0^M P(n)\ln P(n) dn. \end{array}\end{displaymath} (A-43)

The second line of (43) is valid for large $ N$. The value of $ h_M^{\prime\prime}$ increases as $ M$ increases in agreement with our intuition. It is well known that the largest average entropy for $ N$ probabilities is $ \ln N$. Letting $ M \to N$ in (43), we find

$\displaystyle h_N^{\prime\prime} = \ln\left(\frac{2N}{e}\right) < \ln N,$ (A-44)

which is consistent.

Combining (38) and (43), the average information in the power spectrum can be quantitatively estimated using the expression

$\displaystyle I_M = - \left(h_M^\prime + h_M^{\prime\prime}\right) = -\frac{1}{2}\ln E_M + \int_0^M P(n)\ln P(n) dn.$ (A-45)

The first term increases while the second term decreases with increasing $ M$. A maximum will occur for some value $ 1 < M < N$. The spectrum with the maximum information is the optimum spectrum; the value of $ M$ that maximizes (45) is the value we are seeking.

The values of (45) can be monitored continuously while the operator is being computed. However, an approximate analytic solution for the maximum can be found without making very restrictive assumptions on the behavior of $ E_M$. Numerical studies of the author on real seismic data have shown that $ E_M$ can be represented approximately by

$\displaystyle E_M \propto M^{-\beta},$ (A-46)

where $ \beta$ is a slowly varying function of $ M$. Generally, $ \beta$ is in the range $ 2 \ge \beta \ge \frac{1}{2}$, with $ \beta \simeq 2$ for small $ M$ and $ \beta \to \frac{1}{2}$ for large $ M$. Leaving $ \beta$ arbitrary for the moment, substituting (46) into (45), and finding the stationary point, we have

$\displaystyle \frac{\beta}{2}M^{-1} = - P(M)\ln P(M).$ (A-47)

Using (42) for $ \alpha$, Equation (47) can be solved graphically for $ M$. The solution for $ \beta = 2$ is plotted as the solid line in Figure 1.

An analytic bound on $ M$ can be obtained from (47) by noting that the right-hand side of (47) increases with $ M$, so its minimum value occurs when $ M = 0$. Thus, $ M$ has the very simple bound:

$\displaystyle M \le \beta\frac{N}{\ln 2N}.$ (A-48)

Since we have stated already that $ \beta \le 2$ in general, a useful bound on $ M$ for all $ N$ appears to be

$\displaystyle M \le \frac{2N}{\ln 2N}.$ (A-49)

Figure 1 compares the values of $ M$ obtained from (47), from (48), and from $ M = N/2$. The value $ \beta = 2$ is chosen because of the empirical evidence mentioned above and also because

$\displaystyle \vert h_M^\prime\vert \le \frac{\beta}{2}\ln M \le \ln N$ (A-50)

is valid for all $ M \le N$ only for $ \beta \le 2$. The comparison with $ M = N/2$ is of interest because various authors (including this one) have often found this value to be satisfactory for small $ N$. The derivation given above is strictly valid only for large $ N$. But the estimate (49) interpolates well between these extremes as is seen in Figure 1.

Because the correspondence between $ P(n)$ and $ (N-n)^{-\frac{1}{2}}$ has been established by this heuristic argument, the results of this section of the paper should not be interpreted as rigorous estimates of the optimum operator length. Nevertheless, I believe that (47) and (49) are reasonable estimates of the operator length. The derivation was not founded on any assumptions about the type of stochastic process generating the time series. Hence, these estimates are definitely not intended to be an estimate of the order of some underlying autoregressive process. Rather, (49) is an upper bound on the operator length that will extract the most reliable information for a data sample of length $ N$. For example, suppose the time series $ \left\{X_1,\ldots, X_N\right\}$ is a representation of an autoregressive series of order $ L \le M$. Then computing the operator of length $ L$ should give the most efficient estimate of the spectrum; but computing the additional $ (M-L)$ terms should do little to alter that spectrum. Next, suppose the time series is a representation of an AR series of order $ L > M$. The arguments above indicate that we probably cannot obtain a really good estimate of the operator (or the spectrum), because our data sample is simply too small. The best we can hope to do is to compute the operator of length $ M$. In either case, when additional information about the underlying stochastic process is lacking, the best operational decision that can be made appears to be choosing $ M$ according to Equations (47) or (49).

FIG1
Figure 1.
Operator length $ M$ as a function of data sample length $ N$ for three different operator length estimates. The solid line is the solution of Equation (47). The dash line is $ M = 2n/\ln(2N)$. The dot-dash line is $ M = N/2.$ [NR]
FIG1
[pdf] [png]


next up previous [pdf]

Next: Examples Up: Choosing the Operator Length Previous: The common criteria

2009-04-13