The parallel migration tends to be bound in the inner loop by indirect memory addressing for the trace sample values at the migration operator times tk. Simple linear interpolation of a trace value for the time tk requires two indirect memory accesses, any single nearest-neighbor triangle interpolation requires three indirect memory accesses (the triangle peak is placed at the nearest neighbor location of tk), and a general triangle interpolation for filter coefficients falling on non-sampled time locations requires six indirect memory accesses. I use the nearest-neighbor triangle anti-aliasing scheme since it is only 1.5x slower than simple linear interpolation, instead of the general triangle interpolation method which is a rather unacceptable 3x more inefficient than simple linear interpolation. The drawback to the nearest-neighbor triangle filters is that more smoothing is done than required in unaliased regimes where linear interpolation would have sufficed. Since the traces have been doubly integrated, and I can't afford to keep a copy of non-integrated input traces, nor switch to the 6-point general triangle filters easily in a SIMD algorithm flow, I have elected to always smooth at a minimum of three adjacent points on the time access. This is fine for data that has energy mostly below half-Nyquist, but may represent over-smoothing and unnecessary loss of some bandwidth otherwise.