The crudest assumption is that additional zero-valued data may be presumed for the padded area. To avoid an edge diffraction artifact the data must merge smoothly with the padding. So zero padding is a good assumption only if the data is already small around the side boundary. When data is zero padded, it is debatable whether or not it should be tapered (gradually scaled to zero) to match up smoothly with the zero padding. I prefer to avoid tapering the data. That amounts to falsifying it. Instead I prefer to pad the data not with zeroes, but with something that looks more like the data. A simple way is to replicate the last trace, scaling it downward with distance from the boundary. This works best when the stepout of the data matches the stepout of the extension. Any theory for optimum data padding has two important ingredients: a noise model and a signal model. An ideal data extrapolation is rarely, if ever, available in practice. My other book addresses more directly the question of extending gathers.