A method for restoring missing data is to ensure that the restored data, after specified filtering, has minimum energy. Specifying the filter chooses the interpolation philosophy. Generally the filter is a ``roughening" filter. When a roughening filter goes off the end of smooth data, it typically produces a big end transient. Minimizing energy implies a choice for unknown data values at the end, to minimize the transient. We will examine five cases and then make some generalizations.
A method for restoring missing data is to ensure that the restored data, after specified filtering, has minimum energy. |
Let m denote a missing value. The dataset on which the examples are based is .Using subroutine miss1() , values were found to replace the missing m values so that the power in the filtered data is minimized. Figure 2 shows interpolation of the dataset with 1-Z as a roughening filter. The interpolated data matches the given data where they overlap.
mlines
Figure 2 Top is given data. Middle is given data with interpolated values. Missing values seem to be interpolated by straight lines. Bottom shows the filter (1,-1), whose output has minimum power. |
mparab
Figure 3 Top is the same input data as in Figure 2. Middle is interpolated. Bottom shows the filter (-1,2,-1). The missing data seems to be interpolated by parabolas. |
mseis
Figure 4 Top is the same input. Middle is interpolated. Bottom shows the filter (1,-3,3,-1). The missing data is very smooth. It shoots upward high off the right end of the observations, apparently to match the data slope there. |
msmo
Figure 5 The filter (-1,-1,4,-1,-1) gives interpolations with stiff lines. They resemble the straight lines of Figure 2, but they project through a cluster of given values instead of projecting to the nearest given value. Thus, this interpolation tolerates noise in the given data better than the interpolation shown in Figure 4. |
moscil
Figure 6 Bottom shows the filter (1,1). The interpolation is rough. Like the given data itself, the interpolation has much energy at the Nyquist frequency. But unlike the given data, it has little zero-frequency energy. |
Figures 2-6 illustrate that the rougher the filter, the smoother the interpolated data, and vice versa. Let us switch our attention from the residual spectrum to the residual itself. The residual for Figure 2 is the slope of the signal (because the filter 1-Z is a first derivative), and the slope is constant (uniformly distributed) along the straight lines where the least-squares procedure is choosing signal values. So these examples confirm the idea that the least-squares method abhors large values (because they are squared). Thus, least squares tend to distribute uniformly residuals in both time and frequency to the extent the constraints allow.
This idea helps us answer the question, what is the best filter to use? It suggests choosing the filter to have an amplitude spectrum that is inverse to the spectrum we want for the interpolated data. A systematic approach is given in the next section, but I will offer a simple subjective analysis here. Looking at the data, I see that all points are positive. It seems, therefore, that the data is rich in low frequencies; thus the filter should contain something like (1-Z), which vanishes at zero frequency. Likewise, the data seems to contain Nyquist frequency, so the filter should contain (1+Z). The result of using the filter (1-Z)(1+Z)=1-Z2 is shown in Figure 7. This is my best subjective interpolation based on the idea that the missing data should look like the given data. The interpolation and extrapolations are so good that you can hardly guess which data values are given and which are interpolated.
mbest
Figure 7 Top is the same as in Figures 2 to 6. Middle is interpolated. Bottom shows the filter (1,0,-1), which comes from the coefficients of (1-Z)(1+Z). Both the given data and the interpolated data have significant energy at both zero and Nyquist frequencies. |