next up previous print clean
Next: selecting reference anisotropic parameters Up: Tang and Clapp: Lloyd's Previous: introduction

Generalized Lloyd's algorithm

The concept of quantization originates in the field of electrical engineering. The basic idea behind quantization is to describe a continuous function, or one with a large number of samples, by a few representative values. Let x denote the input signal and $\hat{x}=Q(x)$ denote quantized values, where $Q(\cdot)$ is the quantizer mapping function. There will certainly be a distortion if we use $\hat{x}$ to represent x. In the least-square sense, the distortion can be measeured by
\begin{displaymath}
D= \int_{-\infty}^{\infty}(x-Q(x))^2f(x)dx,\end{displaymath} (1)
where f(x) is the probability density function of the input signal. Consider the situation with L quantizers $\hat{x}=(\hat{x}_1, \hat{x}_2, \cdots, \hat{x}_L)$. Let the corresponding quantization intervals be
\begin{displaymath}
T_i=(a_{i-1}, a_i), i=1,2,\ldots,L,\end{displaymath} (2)
where $a_0=-\infty$ and $a_L=\infty$. Then  
 \begin{displaymath}
D=\int_{-\infty}^{\infty}(x-Q(x))^2f(x)dx=\sum_{i=1}^L\int_{T_i}(x-\hat{x}_i)^2f(x)dx.\end{displaymath} (3)
In the discrete case, equation (3) can be written as follows:  
 \begin{displaymath}
D=\sum_{i=1}^L \sum_{x=a_{i-1}}^{a_i}P(x)(x-\hat{x}_i)^2,\end{displaymath} (4)
where P(x) is the discrete version of the probability density function, or normalized histogram ($\sum_x P(x)=1$). To minimize the distortion function D, we take derivatives of equation (4) with respect to $\hat{x}_i$, ai and set them equal to zero, leading to the following conditions for the optimum quantizers $\hat{x}_i$ and quantization interval boundaries $\hat{a}_i$:
      \begin{eqnarray}
\hat{a}_i & = & \frac{\hat{x}_i+\hat{x}_{i+1}}{2}
, \\ \hat{x}_...
 ..._{i-1}}^{\hat{a}_i}P(x)x}{\sum_{x=\hat{a}_{i-1}}^{\hat{a}_i}P(x)}.\end{eqnarray} (5)
(6)

A way to solve this coupled set of nonlinear equations is to first generate an initial set $\{x_1, x_2, \ldots, x_L \}$,then apply equations (5) and (6) alternately until convergence is obtained. This iteration is well known as the Lloyd-Max quantization algorithm (LMQ).

It is fairly straightforward to extend the LMQ to 3D. The main extension of the approach is to break the input data points into 3D clusters instead of 1D cells. A 1D normalized histogram should be replaced with a 3D normalized histogram and a 3D distortion function should be calculated as follows:
\begin{displaymath}
D=\sum_{m=1}^L\left(\sum_{x=a_{i-1}}^{a_i}\sum_{y=b_{j-1}}^{...
 ...r}-\mathbf{\hat{r}})\cdot (\mathbf{r}-\mathbf{\hat{r}})\right),\end{displaymath} (7)
where $\mathbf{\hat{r}}=(\hat{x}_m,\hat{y}_m,\hat{z}_m)$ is the center of cluster m, P(x,y,z) is the 3D histogram at point $\mathbf{r}=(x,y,z)$ within cluster m, and (ai-1,ai), (bj-1, bj), (ck-1, ck) define the boundaries of cluster m. Conditions (5) and (6) accordingly become:

1.
Classification of the points. The Euclidian distance between a point and every cluster center (i.e. quantizer) is calculated. The point is assigned to the closest cluster center.
2.
Updating of the quantizers. Each quantizer is the center of mass of all points which are assigned to the cluster.

It should be pointed out that LMQ is highly dependent on the initial solution and can easily get stuck in local minima. One simple but effective way to avoid these problems is to start with a uniform quantizer, and when a cluster becomes empty, replace it by splitting regions with high variance Clapp (2004). The detailed implementation of the modified 3D Lloyd's algorithm is discussed by Clapp (2006).


next up previous print clean
Next: selecting reference anisotropic parameters Up: Tang and Clapp: Lloyd's Previous: introduction
Stanford Exploration Project
4/5/2006