Next: APPLICATION OF WAVELET-BASED COMPRESSION Up: Sun & Biondi: Data Previous: INTRODUCTION

# WAVELET-TRANSFORM BASED COMPRESSION ALGORITHM

Data compression has been investigated in the field of digital communication for a long time. Generally, data compression techniques can be divided into two major families Nelson (1995):

• lossless compression
Lossless compression consists of those techniques guaranteed to generate an exact duplication of the input dataset after a compress/decompress cycle. Lossless compression is essentially a coding technique. There are many different kinds of coding algorithms, such as Huffman coding Huffman (1952), run-length coding Storer (1988), and arithmetic coding Witten et al. (1987).
• lossy compression
Lossy data compression concedes a certain loss of accuracy in exchange for high compression ratio. Lossy compression proves effective when applied to digitized representations of analog phenomena. By their very nature, these representations are not perfect to begin with, so the idea of output and input not matching exactly is somewhat more acceptable. Most lossy compression techniques can be adjusted to different quality levels, gaining high accuracy in exchange for less effective compression. Lossless compression is a necessary component of every lossy compression approach.

Most of the lossy data compression algorithms follow similar methodology: the original data are mathematically transformed to a new domain in which they are better organized for data compression than in the normal spatial-temporal domain. Therefore, the choice of mathematical transformation is crucial to the performance of compression algorithms.

Among many different kinds of transformation, the wavelet transform Daubechies (1992) has been chosen to develop data compression algorithms. There is a large difference between wavelet transform and Fourier transform. In the Fourier domain, all the elements of the basis are active for all time t, i.e., they are non-local. Consequently, Fourier series converge very slowly when approximating a localized function Cohen (1992). Wavelet transform makes up for the defficiencies of Fourier transform. Wavelet basis function is a novel basis localizing in both time domain and frequency domain. Therefore, wavelet basis function can provide a good approximation for a localized function with only a few terms.

Seismic data have the characteristic of localization, which is the main reason of choosing wavelet transform in seismic data compression. Wavelet transform organizes the seismic data into subbands, each of which shows a different level of temporal and spatial characteristics. In general, large subbands consist of high frequency data in temporal and spatial dimensions. Because of coherency of the data along the spatial dimensions, most of the data in this subband represent noise of little geophysical significance. However, many small subbands at lower frequencies contain more seismic information that should be retained. Figure 1 shows the main procedures of Chevron's compression package.

procedure
Figure 1
The main procedures of Chevron's compression package.

• Wavelet transform
Mathematically, there are numerous kinds of wavelet basis function. In the compression applications, the choice of wavelet is not very critical (Ergas, personal communication), as long as it is reasonably smooth. Chevron compression package chooses the wavelet introduced by Bradley and Brislawn 1994. Biorthogonal wavelet filters are employed along each dimension after a study of a wide range of different wavelet filters Villasenor et al. (1995).
• Quantization
Among the four compression procedures, the key to the success of this lossy compression technique is the choice of quantization methods. It is also this quantization step which introduces loss of information.
• Run-length encoding and Huffman encoding
Chevron's package uses two kinds of coding algorithms, run-length encoding and Huffman encoding. Each of them has different advantages when we are dealing with different components of datasets.

In Figure 2, we show a 2-D schematic representation of wavelet subbands of the seismic data. Different subbands contain different seismic information. It is crucial to choose different quantization procedures for each subband. For example, the low-frequency and low-wavenumber subband contains most of the seismic reflection energy. We should be very careful when dealing with this part. In other words, most of this part should be preserved. In the high-frequency and high-wavenumber subband, there is only some fraction of seismic information, such as high-frequency diffraction hyperbolas, which is nonetheless very important in seismic interpretation. Most of the data samples in this subband belong to uncorrelated noise, losing them only introduces nonobservable influence to the whole dataset.

subband
Figure 2
Schematic representation of the wavelet subbands of the seismic dataset.

In practice, if we want to reach a high compression ratio with good quality, we must implement this technique in a high-dimensional space. Chevron's algorithm performs compression and decompression on 4-D blocks of seismic data (which includes 2- and 3-D blocks as subsets). Each block must be of the same size of n1*n2*n3*n4. n1 is the time axis, but n2, n3, and n4 can represent any spatial axis, such as offset, shot, CDP, streamer, inline, or crossline direction. n1 and n2 must be greater than one, and should generally be more than 16 for a reasonable performance. n3 and n4 can be one or greater.

Next: APPLICATION OF WAVELET-BASED COMPRESSION Up: Sun & Biondi: Data Previous: INTRODUCTION
Stanford Exploration Project
11/12/1997