Printing Floating-Point Numbers - Part 4: Results

March 05, 2014

Dragon CurveI knew this was going to be a difficult problem from the start, and it has lived up to the challenge. One of my initial concerns was how complicated the code would get to make it run at an acceptable speed. It started off terribly slow, but I'm happy to say it has arrived in a good place. Let's discuss the accuracy, performance and any considerations that should be made before using this implementation. You can also find a link to a simple demo project at the bottom of the page.

 

This is the fourth part of a four part series.

 

Accuracy

I am fairly confident that the output is always correct, but there is a very large domain of inputs and I can't validate them all. In addition to testing an assortment of numbers (e.g. the largest double, 1.0, pi, etc), I relied heavily on the following methods.

 

Round trip tests

One interesting route for testing is to print out a value and then read it back in with strtod checking that you get the initial number back. Because MSVC 2010's strtod has some rounding bugs, I had to use David Gay's implementation. I also had to take into account that strtod only supports 64-bit doubles as input. This is important because the minimal number of digits needed to uniquely identify a 32-bit number from all other 32-bit numbers is not guaranteed to be enough to separate it from all 64-bit numbers. This means that I cannot verify the results of PrintFloat32 without writing something like strtof. I can, however, verify the results of PrintFloat64.

 

I printed every possible 32-bit value with 64-bit precision and verified that it was accurately read back in. For 64-bit numbers, there are too many to test them all in a reasonable time frame. Instead, I periodically soak tested randomly generated 64-bit values and verified that they would round trip successfully.

 

Custom format tests

In part 1, I presented a table of all the numbers in a custom 6-bit floating-point format. I ran these same values through the Dragon4 algorithm which is a nice test case because I was able to manually verify the decimal representations of every possible number rather than blindly pumping them back into strtod and checking for round trip equality.

 

Performance

All of my performance tests were done with vanilla "maximize speed" settings in MSVC++ 2010 Express. Timing was done by calling QueryPerformanceCounter around a loop that would print 524288 (1024*512) numbers evenly distributed across the set of possible values (all floats for the 32-bit cases and all doubles for the 64-bit cases). I've listed the averaged call times in microseconds.

 

I did not do any extensive testing on specific number ranges (e.g. numbers with exponents larger than X), but it is to be expected that different implementations perform better and worse based on number categories they have optimized around. For example, David Gay's implementation uses a method from his paper allowing floating-point arithmetic when numbers have small exponents. I also didn't test against the open source Grisu implementation. I did included timing for Microsoft's implementation because it was so easy to test. I was surprised to see how fast it is, but I should also point out that it cannot print over 17 significant digits so it isn't a fair comparison in all cases.

 

One of the most notable advantages of my implementation is its ability to run at 32-bit precision which dramatically reduces the worst case integer size. I don't think there are many egregious issues left in my code, but one of the most obvious optimizations would be to branch on the formatting mode earlier in Dragon4 to remove all margin calculations when they aren't needed. After that, I think I would need to start adding the special case optimizations from David Gay's paper (which I have only skimmed over) or try to implement Grisu for the applicable subset of values. Unfortunately those routes end up adding a substantial amount of code and complexity to what I've presented.

 

32-bit Minimal

Printing 32-bit values with the minimal number of significant digits to identify the number.

ImplementationCodeAvg Time (uSecs)
RJ 32-bitPrintFloat32( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 );0.638497829
RJ 64-bitPrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 );1.216958523
David Gayg_fmt( buffer, value );1.404187202
MSVC++ 2010printf( buffer, "%.17g", value );0.898597717

 

64-bit Minimal

Printing 64-bit values with the minimal number of significant digits to identify the number.

ImplementationCodeAvg Time (uSecs)
RJ 64-bitPrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 );3.902363777
David Gayg_fmt( buffer, value );3.99230051
MSVC++ 2010printf( buffer, "%.17g", value );1.05797863

 

32-bit Sized

Printing 32-bit values with six digits after the decimal point.

ImplementationCodeAvg Time (uSecs)
RJ 32-bitPrintFloat32( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 );0.412725449
RJ 64-bitPrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 );0.441566467
MSVC++ 2010printf( buffer, "%.6e", value );0.77221489

 

64-bit Sized

Printing 64-bit values with six digits after the decimal point.

ImplementationCodeAvg Time (uSecs)
RJ 64-bitPrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 );1.400483608
MSVC++ 2010printf( buffer, "%.6e", value );0.919020653

 

Downloads

If you want to try out the code without coping and pasting it all from the website, you can download this test project.

  • PrintFloat.zip - Source code, Examples, and MSVC 2010 solution/project files