I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.
wren69911 day ago | | | parent | | on: 47753626
Yeah, it feels intuitively like it should work, but even with full 64-bit modular multiply the bounds don't overlap. I should probably delete that sentence, but on the other hand maybe the wild goose chase will lead to someone finding a better trick. For now my double-precision multiply is just direct expansion.
purplesyringa1 day ago | | | parent | | on: 47755898
I think it fails because it seems like the difference between 32-bit and 64-bit floats is 2x, but in reality we should look at the mantissa, and the increase from 23 bits to 52 bits is much greater.

Although I managed to tweak this method to work with 3 multiplications.

ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.

First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:

    a = 98a67ee86f8cf
    b = da19d2c9dfe71

    (a >> 20) * (b >> 20)         = 820d2e04637bf428
    (a >> 8) * (b >> 8) % 2**64   =       0547f8cdb2100210
    ->
    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210

    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210
    (a * b) % 2**64               =           080978075f64355f
    ->
    a * b                         = 820d2e0548080978075f64355f
And my attempt at implementation: https://play.rust-lang.org/?version=stable&mode=release&edit...
purplesyringa1 day ago | | | parent | | on: 47757081
And here are the constants for the ostensibly more useful 54x54->108 product: https://play.rust-lang.org/?version=stable&mode=release&edit...

I tried to go even higher, but the bounds seems to break at 55 bits.