
Printing Floating-Point Numbers - Part 4: Results

This is the fourth part of a four part series.
Accuracy
I am fairly confident that the output is always correct, but there is a very large domain of inputs and I can't validate them all. In addition to testing an assortment of numbers (e.g. the largest double, 1.0, pi, etc), I relied heavily on the following methods.
Round trip tests
One interesting route for testing is to print out a value and then read it back in with strtod checking that you get the initial number back. Because MSVC 2010's strtod has some rounding bugs, I had to use David Gay's implementation. I also had to take into account that strtod only supports 64-bit doubles as input. This is important because the minimal number of digits needed to uniquely identify a 32-bit number from all other 32-bit numbers is not guaranteed to be enough to separate it from all 64-bit numbers. This means that I cannot verify the results of PrintFloat32 without writing something like strtof. I can, however, verify the results of PrintFloat64.
I printed every possible 32-bit value with 64-bit precision and verified that it was accurately read back in. For 64-bit numbers, there are too many to test them all in a reasonable time frame. Instead, I periodically soak tested randomly generated 64-bit values and verified that they would round trip successfully.
Custom format tests
In part 1, I presented a table of all the numbers in a custom 6-bit floating-point format. I ran these same values through the Dragon4 algorithm which is a nice test case because I was able to manually verify the decimal representations of every possible number rather than blindly pumping them back into strtod and checking for round trip equality.
Performance
All of my performance tests were done with vanilla "maximize speed" settings in MSVC++ 2010 Express. Timing was done by calling QueryPerformanceCounter around a loop that would print 524288 (1024*512) numbers evenly distributed across the set of possible values (all floats for the 32-bit cases and all doubles for the 64-bit cases). I've listed the averaged call times in microseconds.
I did not do any extensive testing on specific number ranges (e.g. numbers with exponents larger than X), but it is to be expected that different implementations perform better and worse based on number categories they have optimized around. For example, David Gay's implementation uses a method from his paper allowing floating-point arithmetic when numbers have small exponents. I also didn't test against the open source Grisu implementation. I did included timing for Microsoft's implementation because it was so easy to test. I was surprised to see how fast it is, but I should also point out that it cannot print over 17 significant digits so it isn't a fair comparison in all cases.
One of the most notable advantages of my implementation is its ability to run at 32-bit precision which dramatically reduces the worst case integer size. I don't think there are many egregious issues left in my code, but one of the most obvious optimizations would be to branch on the formatting mode earlier in Dragon4 to remove all margin calculations when they aren't needed. After that, I think I would need to start adding the special case optimizations from David Gay's paper (which I have only skimmed over) or try to implement Grisu for the applicable subset of values. Unfortunately those routes end up adding a substantial amount of code and complexity to what I've presented.
32-bit Minimal
Printing 32-bit values with the minimal number of significant digits to identify the number.
| Implementation | Code | Avg Time (uSecs) |
| RJ 32-bit | PrintFloat32( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 ); | 0.638497829 |
| RJ 64-bit | PrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 ); | 1.216958523 |
| David Gay | g_fmt( buffer, value ); | 1.404187202 |
| MSVC++ 2010 | printf( buffer, "%.17g", value ); | 0.898597717 |
64-bit Minimal
Printing 64-bit values with the minimal number of significant digits to identify the number.
| Implementation | Code | Avg Time (uSecs) |
| RJ 64-bit | PrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, -1 ); | 3.902363777 |
| David Gay | g_fmt( buffer, value ); | 3.99230051 |
| MSVC++ 2010 | printf( buffer, "%.17g", value ); | 1.05797863 |
32-bit Sized
Printing 32-bit values with six digits after the decimal point.
| Implementation | Code | Avg Time (uSecs) |
| RJ 32-bit | PrintFloat32( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 ); | 0.412725449 |
| RJ 64-bit | PrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 ); | 0.441566467 |
| MSVC++ 2010 | printf( buffer, "%.6e", value ); | 0.77221489 |
64-bit Sized
Printing 64-bit values with six digits after the decimal point.
| Implementation | Code | Avg Time (uSecs) |
| RJ 64-bit | PrintFloat64( buffer, bufferSize, value, PrintFloatFormat_Scientific, 6 ); | 1.400483608 |
| MSVC++ 2010 | printf( buffer, "%.6e", value ); | 0.919020653 |
Downloads
If you want to try out the code without coping and pasting it all from the website, you can download this test project.
- PrintFloat.zip - Source code, Examples, and MSVC 2010 solution/project files
Comments