next up previous [pdf]

Next: Extensions Up: Clapp: Picking Previous: Viterbi

Lloyd

The concept of quantization originates in the field of electrical engineering. The basic idea behind quantization is to describe a continuous function, or one with a large number of samples, by a few representative values. Let $x$ denote the input signal and $\hat{x}=Q(x)$ denote quantized values, where $Q(\cdot)$ is the quantizer mapping function. There will certainly be a distortion if we use $\hat{x}$ to represent $x$. In the least-square sense, the distortion can be measured by

\begin{displaymath}
D=\sum_i^n (x-Q(x))^2.
\end{displaymath} (3)

Consider the situation with $L$ quantizes $\hat{x}=(\hat{x}_1, \hat{x}_2, \cdots, \hat{x}_L)$. Let the corresponding quantization intervals be
\begin{displaymath}
T_i=(a_{i-1}, a_i), i=1,2,\ldots,L,
\end{displaymath} (4)

where $a_0=min(x)$ and $a_L=max(x)$. The distortion function then becomes
\begin{displaymath}
D=\sum_{i=1}^L \sum_{x=a_{i-1}}^{a_i}P(x)(x-\hat{x}_i)^2,
\end{displaymath} (5)

where $P(x)$ is the discrete version of the probability density function, or normalized histogram ($\sum_x P(x)=1$). To minimize the distortion function $D$, we take derivatives of equation (5) with respect to $\hat{x}_i$, $a_i$ and set them equal to zero, leading to the following conditions for the optimum quantizers $\hat{x}_i$ and quantization interval boundaries $\hat{a}_i$:
$\displaystyle \hat{a}_i$ $\textstyle =$ $\displaystyle \frac{\hat{x}_i+\hat{x}_{i+1}}{2},$ (6)
$\displaystyle \hat{x}_i$ $\textstyle =$ $\displaystyle \frac{\sum_{x=\hat{a}_{i-1}}^{\hat{a}_i}P(x)x}
{\sum_{x=\hat{a}_{i-1}}^{\hat{a}_i}P(x)}.$ (7)

A way to solve this coupled set of nonlinear equations is to first generate an initial set $\{x_1, x_2, \ldots, x_L \}$, then apply equations (6) and (7) alternately until convergence is obtained. This iteration is well known as the Lloyd-Max quantization algorithm (LMQ). A common modification is to form

\begin{displaymath}
D(i) = \sum_{x=a_{i-1}}^{a_i}P(x)(x-\hat{x}_i)^2
,
\end{displaymath} (8)

and to remove $a_i$ where the distortion is small and possibly add $a$s in regions where the distortion is large. The resulting $a$ locations is often much smaller than the initial set of values.

The LMQ scheme is designed to find the best representation of a distribution, which is not what I am trying to do in this instance. Instead I am trying to the achieve the representation of $y(x)$ with as few $x_i,y_i$ points as possible. The twist on the standard LMQ scheme is the replacement of $P(x)$ in equation 5. Instead of being the probability density function I construct an error from a background piece-wise linear function. I first construct $z(x)$ by linear interpolating between $x_i,y_i$ samples. I then calculate $d(x)=y(x)-z(x)+min(y(x))$, the error from the piecewise linear background. Figure 4 demonstrates the methodology. Figure 4a shows a curve with `*' the initial $x_i,y_i$ points and the resulting $z(x)$ function. Figure 4b shows the $d(x)$ function constructed from $y(x)$ and $z(x)$. We now have something that is approximating the shape of a probability density function except that it can be positive or negative. To get around this problem I first

\begin{displaymath}
sn=\sum_{i=x_{i-1}}^{x_i} d(i),
\end{displaymath} (9)

then if $sn$ is positive I define $P(x)$,
\begin{displaymath}
P(x_{i-1}..x_i) = d(x_{i-1}...x_i) + min(d(x_{i-1}...x_i))
.
\end{displaymath} (10)

If $sn<0$ I define
\begin{displaymath}
P(x_{i-1}..x_i) = -d(x_{i-1}...x_i) -max(d(x_{i-1}...x_i))
.
\end{displaymath} (11)

lloyd1
Figure 4.
Panel (a) shows the original curve (solid line); an initial set of $a$ value, asterisks; and the background, dashed, curve $z$. Panel (b) shows the deviation $d$ from the piece-wise linear background. [ER]
lloyd1
[pdf] [png]

As a result $P(x_{i-1}..x_i)$ is always positive. Flipping the signs does not violate the LMQ concept. What equation 7 is attempting to do is a local center of mass calculation. By applying equation 10 or 11 we are transforming our coordinate system to obtain an accurate center of mass calculation. How accurate the curve is represented is determined by the number of $a_i$ terms. In practice it is best to start with a dense representation of $a_i$ to avoid local minima and then use the fitting criteria of equation 8 to eliminate points in regions with small deviations. Figure 5 demonstrates this concept. The solid curve in Figure 5 is the original function. The three dashed curves show different deviation criteria. With increasing accuracy an increasing number of points are needed to represent the curve. In this example 2, 9, 28 and points are used.

lloyd
Figure 5.
The effect of modifying the deviation criteria. In panel (a) the solid curve in Figure 5 is the original function. The three dashed curves show three different deviation criteria. The closer the fit to the original curve the more points that are needed for an accurate representation. Panel (b) shows the error in the fitting functions. [ER]
lloyd
[pdf] [png]



Subsections
next up previous [pdf]

Next: Extensions Up: Clapp: Picking Previous: Viterbi

2009-04-13