The central idea of pre-whitening is to use two PE filters. Each PE filter minimizes the mean square energy of its output given its input. However, the second filter is given the output of the first filter. Once we have found the second filter, we apply it to the original input (the input that we used to train the first filter). Obviously, the second filter will not be optimal for removing the predictable events of the original input. As a matter of fact, the second filter is blind with regard to the events the first filter removed from the data. The second filter applied to the original input will remove the components it was trained to remove, but it will preserve the components that the first filter is capable of removing.
Imagine that an input time series contains three predictable components. Given an input time series and a given filter length, we find the first optimal PE filter. The data component this first filter removes is the first predictable component.
If the spectrum of the input time series is Y1, then a PE filter P1 corresponds to
where represents a smooth version of Y's spectrum. The output spectrum after such a PE filter step is
I define Y2 oas the utput of the first PE filter
where represents a smoothing that might be different from the P1 smoothing <>. Applying P2 to Y2 yields
W2 = P2 Y2 = P2 W1which indicates that W2 is whiter than W1.
If we apply the second PE filter, P2 to the original data spectrum Y1, we finally get
The simple identity
allows us to introduce Y2 according to expression 1