Zmij: Faster floating point double-to-string conversion | Not Hacker News!

Zmij: Faster Floating Point Double-to-String Conversion

Posted20 days agoActive15 days ago

134 points

19 comments

vitaut.netTech Discussionstory

informativepositive

Debate

20/100

Floating Point ConversionAI Performance AnalysisString Formatting

Key topics

Floating Point Conversion

AI Performance Analysis

String Formatting

The quest for faster double-to-string conversion just got a significant boost with the introduction of Zmij, a new algorithm that outpaces its predecessors. Commenters were impressed, with the creator of Grisu, an earlier breakthrough, praising Zmij's performance, and the author revealing that Zmij borrowed an idea from Cassio Neri's work on Teju Jaguá. As contributors chimed in, discussing potential comparisons with Teju Jaguá and sharing their own implementations, a broader question emerged: why do most research efforts focus on double-to-string conversion, leaving string-to-double algorithms in the shadows?

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

3d

Peak period

18

84-96h

Avg / period

7.2

Comment distribution43 data points

Loading chart...

Based on 43 loaded comments

Key moments

01Story posted
Dec 14, 2025 at 10:42 AM EST
20 days ago
Step 01
02First comment
Dec 17, 2025 at 3:45 PM EST
3d after posting
Step 02
03Peak activity
18 comments in 84-96h
Hottest window of the conversation
Step 03
04Latest activity
Dec 20, 2025 at 12:00 AM EST
15 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (19 comments)

Showing 43 comments

17 days ago

1 reply

Pretty impressive.

When I published Grisu (Google double-conversion), it was multiple times faster than the existing algorithms. I knew that there was still room for improvement, but I was at most expecting a factor 2 or so. Six times faster is really impressive.

17 days ago

Thank you! It means a lot coming from you, Grisu was the first algorithm that I implemented =). (I am the author of the blog post.)

17 days ago

2 replies

Very interesting!

I wonder how Teju Jaguá compares. I don't see it in the C++ benchmark repo you linked and whose graph you included.

I have contributed an implementation in Rust :) https://crates.io/crates/teju it includes benchmarks which compare it vs Ryu and vs Rust's std lib. It's quite easy to run if you're interested!

17 days ago

1 reply

I am not sure how it compares but I did use one idea from Cassio's talk on Teju:

> A more interesting improvement comes from a talk by Cassio Neri Fast Conversion From Floating Point Numbers. In Schubfach, we look at four candidate numbers. The first two, of which at most one is in the rounding interval, correspond to a larger decimal exponent. The other two, of which at least one is in the rounding interval, correspond to the smaller exponent. Cassio’s insight is that we can directly construct a single candidate from the upper bound in the first case.

17 days ago

Indeed! I saw that you linked to Neri's work, so you were aware of Teju Jaguá. Might make a pull request to add it to the benchmark repo when I have some time :)

Another nice thing about your post is mentioning the "shell" of the algorithm, that is, actually translating the decimal significant and exponent to a string (as opposed to the "core", turning f * 2^e into f' * 10^e'). A decent chunk of the overall time is spent there, so it's worth optimising it as well.

17 days ago

1 reply

Any ideas why Rust’s stdlib hasn’t adopted faster implementations?

17 days ago

1 reply

I think code size is one notable reason

16 days ago

I am pretty sure Dragonbox is smaller than Ryu in terms of code size because it can compress the tables.

17 days ago

1 reply

Thank you for the code. I could port this easily to C and it solved a lot of portability issues for me.

17 days ago

1 reply

I've added a C implementation in https://github.com/vitaut/zmij/blob/main/zmij.c in case you are interested.

16 days ago

1 reply

Nice, but it's too late I needed a different API for future use in my custom sprintf so I made mulle-dtostr (https://github.com/mulle-core/mulle-dtostr). On my machine (AMD) that benchmarked in a quick try quite a bit faster even, but I was just checking that it didn't regress too badly and didn't look at it closer.

15 days ago

Please note that there is some error in your port:

Error: roundtrip fail 4.9406564584124654e-324 -> '5.e-309' -> 4.9999999999999995e-309

Error: roundtrip fail 6.6302941479442929e-310 -> '6.6302941479443e-309' -> 6.6302941479442979e-309

Error: roundtrip fail -1.9153028533493997e-310 -> '-1.9153028533494e-309' -> -1.9153028533493997e-309

Error: roundtrip fail -2.5783653320086361e-312 -> '-2.57836533201e-309' -> -2.5783653320099997e-309

17 days ago

1 reply

It seems that most research effort goes into better dtoa, and not enough in a better atod. There are probably a dozen dtoa algorithms now, and (I think?) two for atod. Anyone know why?

17 days ago

2 replies

Good question. I am not familiar with string-to-double algorithms but maybe it's an easier problem? double-to-string is a relatively complex, people even doing PhD in this area. There is also some inherent asymmetry: formatting is more common than parsing.

17 days ago

2 replies

> formatting is more common than parsing.

Is it, though? It's genuinely hard for me to tell.

There's both serialization and deserialization of data sets with, e.g., JSON including floating point numbers, implying formatting and parsing, respectively.

Source code (including unit tests etc.) with hard-coded floating point values is compiled, linted, automatically formatted again and again, implying lots of parsing.

Code I usually work with ingests a lot of floating point numbers, but whatever is calculated is seldom displayed as formatted strings and more often gets plotted on graphs.

16 days ago

Think about things like logging and all the uses of printf which are not parsed back. But I agree that parsing is extremely common, just not the same level.

16 days ago

For serialization and deserialization, when the goal is to produce strings that will be read again by a computer, I consider the use of decimal numbers as a serious mistake.

The conversion to string should produce a hexadecimal number, not a decimal number, so that both serialization and deserialization are trivial and they cannot introduce any errors.

Even if a human inspects the strings produced in this way, comparing numbers to see which is greater or less and examining the order of magnitude can be done as easy as with decimal numbers. Nobody will want to do arithmetic computations mentally with such numbers.

17 days ago

3 replies

In implementing Rust's serde_json library, I have dealt with both string-to-double and double-to-string. Of the two, I found string-to-double was more complex.

Unlike formatting, correct parsing involves high precision arithmetic.

Example: the IEEE 754 double closest to the exact value "0.1" is 7205759403792794*2^-56, which has an exact value of A (see below). The next higher IEEE 754 double has an exact value of C (see below). Exactly halfway between these values is B=(A+C)/2.

A=0.1000000000000000055511151231257827021181583404541015625

B=0.100000000000000012490009027033011079765856266021728515625

C=0.10000000000000001942890293094023945741355419158935546875

So for correctness the algorithm needs the ability to distinguish the following extremely close values, because the first is closer to A (must parse to A) whereas the second is closer to C:

0.1000000000000000124900090270330110797658562660217285156249

0.1000000000000000124900090270330110797658562660217285156251

The problem of "string-to-double for the special case of strings produced by a good double-to-string algorithm" might be relatively easy compared to double-to-string, but correct string-to-double for arbitrarily big inputs is harder.

17 days ago

I guess one aspect of it is that in really high performance fields where you're taking in lots of stringy real inputs (FIX messages coming from trading venues for example containing prices and quantities) you would simply parse directly to a fixed point decimal format, and only accept fixed (not scientific) notation. So except for trailing or leading zeros there is no normalisation to be done.

Parsing to binary is often undesirable to begin with.

16 days ago

For those wishing to read up on this subject, an excellent starting point is this comprehensive post by one of the main contributors of the fast algorithm currently used in core:

https://old.reddit.com/r/rust/comments/omelz4/making_rust_fl...

16 days ago

> Unlike formatting, correct parsing involves high precision arithmetic.

Formatting also requires high precision arithmetic unless you disallow user-specified precision. That's why {fmt} still has an implementation of Dragon4 as a fallback for such silly cases.

17 days ago

1 reply

I read the post and the companion post:

https://vitaut.net/posts/2025/smallest-dtoa/

And there’s one detail I found confusing. Suppose I go through the steps to find the rounding interval and determine that k=-3, so there is at most one integer multiple of 10^-3 in the interval (and at least one multiple of 10^-4). For the sake of argument, let’s say that -3 worked: m·10^-3 is in the interval.

Then, if m is not a multiple of 10, I believe that m·10^-3 is the right answer. But what if m is a multiple of 10? Then the result will be exactly equal, numerically, to the correct answer, but it will have trailing zeros. So maybe I get 7.460 instead of 7.46 (I made up this number and I have no idea whether any double exists gives this output.) Even though that 6 is definitely necessary (there is no numerically different value with decimal exponent greater than -3 that rounds correctly), I still want my formatter library to give me the shortest decimal representation of the result.

Is this impossible for some reason? Is there logic hiding in the write function to simplify the answer? Am I missing something?

17 days ago

This is possible and the trailing zeros are indeed removed (with the exponent adjusted accordingly) in the write function. The post mentions removing trailing zeros without going into details but it's a pretty interesting topic and was recently changed to use lzcnt/bsr instead of a lookup table.

16 days ago

1 reply

This blows my mind TBH. I used to say a few years back that Ryu is my favorite modern algorithm but it felt so complicated. Your C implentation almost feels... simple!

Congratulations, can't wait to have some time to study this further

16 days ago

2 replies

Thank you! The simplicity is mostly thanks to Schubfach although I did simplify it a bit more. Unfortunately the paper makes it appear somewhat complex because of all the talk about generic bases and Java workarounds.

16 days ago

2 replies

I've just started a Julia port and I think it will be even cleaner than the C version (mostly because Julia gives you a first class (U)Int128 and count leading zeros (and also better compile time programming that lets you skip on writing the first table out explicitly).

16 days ago

1 reply

Oh wow, I would love to see that if you can share it :)

16 days ago

1 reply

Once I finish it, I'll be PRing to the Julia repo (to replace the current Ryu version), and I'll drop a link here.

15 days ago

1 reply

I started a section to list implementations in other languages: https://github.com/vitaut/zmij?tab=readme-ov-file#other-lang.... Once yours is complete feel free to submit a PR to add it there.

15 days ago

Not adding it until complete, but https://github.com/JuliaLang/julia/pull/60439 is the draft.

16 days ago

Cool, please share once it is complete.

C++ also provides countl_zero: https://en.cppreference.com/w/cpp/numeric/countl_zero.html. We currently use our own for maximum portability.

I considered computing the table at compile time (you can do it in C++ using constexpr) but decided against it not to add compile-time overhead, however small. The table never changes so I'd rather not users pay for recomputing it every time.

15 days ago

1 reply

Quick question, if you are still around :).

I have been doing some tests. Is it correct to assume that it converts 1.0 to "0.000000000000001e+15".

Is there a test suite it is passing?

15 days ago

2 replies

It converts 1.0 to "1.e-01" which reminds me to remove the trailing decimal point =). dtoa-benchmark tests that the algorithm produces valid results on its dataset.

15 days ago

1 reply

So if I use:

    #include "zmij.h"
    #include <stdio.h>

    int main() {
        char buf[zmij::buffer_size];
        zmij::dtoa(1.0, buf);
        puts(buf);
    }

I get `g++ zmij.cc test.c -o test && ./test` => `0.000000000000001e+15`

15 days ago

1 reply

My bad, you are right. The small integer optimization should be switched to a different output method (or disabled since it doesn't provide much value). Thanks for catching this!

15 days ago

Should be fixed now.

15 days ago

1 reply

"1.e-01" is for 0.1, not 1.0.

15 days ago

1 reply

I assume they meant 1.e+00

That is what Schubfach does

15 days ago

Yeah, that's what I meant.

16 days ago

1 reply

Is the large significands table (~19KB) an issue for cache contention in practical use, or are only a small range of significands generally accessed?

16 days ago

It depends on the input distribution, specifically exponents. It is also possible to compress the table at the cost of additional computation using the method from Dragonbox.

17 days ago

Already done ages ago. Nothing more of interest.

The bottleneck are the 3 conditionals: - positive or negative - positive or negative exponent, x > 10.0 - correction for 1.xxxxx * 2^Y => fract(log10(2^Y)) 1.xxxxxxxx > 10.0

NovemberWhiskey

17 days ago

When I saw the title here, my first thought was “wow, these RISC-V ISA extensions are getting out of hand”

View full discussion on Hacker News

ID: 46263821Type: storyLast synced: 12/18/2025, 11:20:38 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN