Yeah, it feels intuitively like it should work, but even with full 64-bit modular multiply the bounds don't overlap. I should probably delete that sentence, but on the other hand maybe the wild goose chase will lead to someone finding a better trick. For now my double-precision multiply is just direct expansion.
purplesyringa1 day ago | | | parent | | on: 47755898
I think it fails because it seems like the difference between 32-bit and 64-bit floats is 2x, but in reality we should look at the mantissa, and the increase from 23 bits to 52 bits is much greater.

Although I managed to tweak this method to work with 3 multiplications.

ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.

First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:

    a = 98a67ee86f8cf
    b = da19d2c9dfe71

    (a >> 20) * (b >> 20)         = 820d2e04637bf428
    (a >> 8) * (b >> 8) % 2**64   =       0547f8cdb2100210
    ->
    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210

    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210
    (a * b) % 2**64               =           080978075f64355f
    ->
    a * b                         = 820d2e0548080978075f64355f
And my attempt at implementation: https://play.rust-lang.org/?version=stable&mode=release&edit...
purplesyringa1 day ago | | | parent | | on: 47757081
And here are the constants for the ostensibly more useful 54x54->108 product: https://play.rust-lang.org/?version=stable&mode=release&edit...

I tried to go even higher, but the bounds seems to break at 55 bits.