next up previous [pdf]

Next: APPENDIX C Up: APPENDIX B Previous: APPENDIX B

Division-free reciprocal cube roots

Sometime back in the '90s, square roots started to be implemented as $ z \times z^{-1/2}$ where the reciprocal square root was implemented using one or two iterations of Newton's method. As the Newton formula for the reciprocal square root could be written with only multiplications and additions, this was several times faster than computer division. Indeed, division was often replaced by squaring the reciprocal square root.

For the Lanczos root-finding methods in the previous appendix, a reciprocal cube root is needed. Fortunately, this, too, can be obtained using Newton's method in a division-free manner as follows:

Let

$\displaystyle f(x) = \frac{1}{x^3} - z
$

be the function whose root we want to find. Taking its derivative,

$\displaystyle f'(x) = \frac{-3}{\,\,x^4}$ ,

produces the Newton step

$\displaystyle x_{n+1}= x_n - \frac{f(x_n)}{f'(x_n)} =
x_n
({\textstyle \frac{4}{3}} - {\textstyle \frac{1}{3}} z \, x_n^3)$    .

The remaining issue is choosing an appropriate first guess, $ x_0$ , of the root in order to start the iteration. For this I again look to the fast reciprocal square root for guidance. McEniry (2007) reproduces a classic code (without the profane comment) containing a ``magic number'' from which half the integer representation of the floating point input is subtracted to produce an integer representation of the initial guess. This starting point was good enough that a single Newton iteration resulted in a worst case relative error of less than 0.175%. Mimicing McEniry's development yields the following code for a reciprocal cube root:

   float InvCubeRoot ( float x ) {
      const float onethird = 0.333333333333;
      const float fourthirds = 1.333333333333;
      float thirdx = x * onethird;
      union {
        int ix;
        float fx;
      } z;

      z.fx = x;
      z.ix = 0x54a21d2a - z.ix/3; /* magic */
      x = z.fx;
      x = x * ( fourthirds - thirdx * x*x*x ); /* max relerr < 2.34E-3 */
      x = x * ( fourthirds - thirdx * x*x*x ); /* max relerr < 1.09E-5 */
      return x;
   }

There is still one hitch--the ``magic'' line is not division free. Fortunately, the hacker and compiler community has worked out division-free integer division. For division by 3, this is accomplished by multiplying the numerator by the binary expansion 0.010101010101...of $ \frac{1}{3}$ in fixed point arithmetic just like we were all taught in elementary school. For 32 bit numerators, we multiply by the hexadecimal constant 55555556 and shift the (64 bit) result down by 32 binary places. Therefore the ``magic'' line of code becomes

      z.ix = 0x54a21d2a -
             (int) ((z.ix * (int64_t) 0x55555556)>>32); /* magic */
where the tail end 6 instead of 5 in the multiplier handles the cases where the integer is not an exact multiple of 3.

Performing timing tests in C with random numbers, this algorithm ran 20 times faster than calling powf(x,-1.0f/3.0f) from the C math runtime library and about 10 times faster than my best previous effort to calculate a fast reciprocal cube root.


next up previous [pdf]

Next: APPENDIX C Up: APPENDIX B Previous: APPENDIX B

2012-10-29