Two point raytracing for reflection off a 3D plane |

For the Lanczos root-finding methods in the previous appendix, a reciprocal
*cube* root is needed. Fortunately, this, too, can be obtained
using Newton's method in a division-free manner as follows:

Let

be the function whose root we want to find. Taking its derivative,

,

produces the Newton step

.

The remaining issue is choosing an appropriate first guess, , of the root in order to start the iteration. For this I again look to the fast reciprocal square root for guidance. McEniry (2007) reproduces a classic code (without the profane comment) containing a ``magic number'' from which half the integer representation of the floating point input is subtracted to produce an integer representation of the initial guess. This starting point was good enough that a single Newton iteration resulted in a worst case relative error of less than 0.175%. Mimicing McEniry's development yields the following code for a reciprocal cube root:

float InvCubeRoot ( float x ) { const float onethird = 0.333333333333; const float fourthirds = 1.333333333333; float thirdx = x * onethird; union { int ix; float fx; } z; z.fx = x; z.ix = 0x54a21d2a - z.ix/3; /* magic */ x = z.fx; x = x * ( fourthirds - thirdx * x*x*x ); /* max relerr < 2.34E-3 */ x = x * ( fourthirds - thirdx * x*x*x ); /* max relerr < 1.09E-5 */ return x; }

There is still one hitch--the ``magic'' line is not division free. Fortunately, the hacker and compiler community has worked out division-free integer division. For division by 3, this is accomplished by multiplying the numerator by the binary expansion 0.010101010101...of in fixed point arithmetic just like we were all taught in elementary school. For 32 bit numerators, we multiply by the hexadecimal constant 55555556 and shift the (64 bit) result down by 32 binary places. Therefore the ``magic'' line of code becomes

z.ix = 0x54a21d2a - (int) ((z.ix * (int64_t) 0x55555556)>>32); /* magic */where the tail end 6 instead of 5 in the multiplier handles the cases where the integer is not an exact multiple of 3.

Performing timing tests in C with random numbers, this algorithm ran 20 times
faster than calling *powf*(*x*,-1.0*f*/3.0*f*) from the C math
runtime library and about 10 times faster than my best previous effort to
calculate a fast reciprocal cube root.

Two point raytracing for reflection off a 3D plane |

2012-10-29