The basic idea of least-squares fitting is that the residual is orthogonal to the fitting functions. Applied to the PE filter, this idea means that the output of a PE filter is orthogonal to lagged inputs. The orthogonality applies only for lags in the past, because prediction knows only the past while it aims to the future. What we want to show is different, namely, that the output is uncorrelated with itself (as opposed to the input) for lags in both directions; hence the output spectrum is white.
In (26) are two separate and independent autoregressions, for finding the filter ,and for finding the filter .Evidently, if the matrices were infinitely tall, or if the values in the columns of the matrices got small towards the top and bottom, we would find that .
(26) |
Our goal is a different theorem that is imprecise when applied to the three coefficient filters displayed in (26), but becomes valid as the filter length tends to infinity and the matrices become infinitely wide. Actually, all we require is that bn tend to zero. This generally happens because as n increases, yt-n becomes a weaker and weaker predictor of yt.
The matrix contains all of the columns that are found in except the last (and the last one will turn out be irrelevant). This means that is not only orthogonal to all of 's columns (except the first) but is also orthogonal to all of 's columns except the last. Although isn't really perpendicular to the last column of , it doesn't matter because that column has hardly any contribution to since |bn|<<1. Because is (effectively) orthogonal to all the components of , is also orthogonal to itself. (For any and , if and then ).
In choosing the example of Figure (26), I have shifted the two fitting problems by one lag. We could draw the two problems again shifted by two lags, three lags, and more, and we would find that and are always orthogonal. Actually, and both contain the same signal but time-shifted. The orthogonality at all shifts means that the autocorrelation of vanishes at all lags. The autocorrelation does not vanish at zero lag, however, because is not orthogonal to its first column (because we did not minimize with respect to a0).
As we redraw for various lags, we may shift the columns only downward because shifting them upward would bring in the first column of and the residual is not orthogonal to that. Thus we have only proven that one side of the autocorrelation of vanishes. That is enough however, because autocorrelation functions are symmetric, so if one side vanishes, the other must also.
We also see where the proof would break if and were two-sided filters like .If were two-sided, would catch the nonorthogonal column of .Not only is not proven to be perpendicular to the first column of ,but it cannot be orthogonal to it because a signal cannot be orthogonal to itself.
The consequence of this theorem is that the convolution of and is white noise (its autocorrelation is an impulse function) and that means that and have mutually inverse spectrums. In other words, captures a fundamental statistical aspect of .Where information is missing we can use the PEF to guess it.