Hacker News

56 points by luu 3 days ago | hide | 9 comments

I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.

Yeah, it feels intuitively like it should work, but even with full 64-bit modular multiply the bounds don't overlap. I should probably delete that sentence, but on the other hand maybe the wild goose chase will lead to someone finding a better trick. For now my double-precision multiply is just direct expansion.

I think it fails because it seems like the difference between 32-bit and 64-bit floats is 2x, but in reality we should look at the mantissa, and the increase from 23 bits to 52 bits is much greater.

Although I managed to tweak this method to work with 3 multiplications.

ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.

First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:

    a = 98a67ee86f8cf
    b = da19d2c9dfe71

    (a >> 20) * (b >> 20)         = 820d2e04637bf428
    (a >> 8) * (b >> 8) % 2**64   =       0547f8cdb2100210
    ->
    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210

    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210
    (a * b) % 2**64               =           080978075f64355f
    ->
    a * b                         = 820d2e0548080978075f64355f

And my attempt at implementation: https://play.rust-lang.org/?version=stable&mode=release&edit...

And here are the constants for the ostensibly more useful 54x54->108 product: https://play.rust-lang.org/?version=stable&mode=release&edit...

I tried to go even higher, but the bounds seems to break at 55 bits.

Link to Mark Owen’s excellent QFP library for soft float on Cortex-M0+ (ARMv6-M) and Cortex-M3/M4 (ARMv7-M).

https://www.quinapalus.com/qfplib.html

Nice write up here, too, I like the idea of a firm float.

I appreciate this warning near the top: This post contains floating point. Floating point is known to the State of California to cause confusion and a fear response in mammalian bipeds.

I wisely hit the back button :)

I saw an explanation (perhaps Fabien's) which made it click - floating point is just segment:offset addressing for numbers!

Then the full horror of it hit me.

Since the trick works on mantissa only without hidden 1 included, I wonder if that number of bits in mantissa was chosen because of it

Not really.

The IEEE single-precision format with a hidden bit has replaced earlier single-precision formats, most of which had a 24-bit significand and a 7-bit exponent, chosen thus for alignment on byte boundaries, which simplified the emulation in software of the hardware floating-point units at a time when many computers lacked hardware FPUs.

When the idea of using a hidden bit freed one bit in the memory format, a decision was required, whether to use it for the exponent or for the significand.

The experience of using the older single-precision formats before 1980 indicated that the frequent overflows and underflows caused by a 7-bit exponent were more annoying than insufficient precision in the significand, so it was decided to use the hidden bit for increasing the exponent from 7 bits to 8 bits.

At that time, by 1980, the efficient implementation in hardware had become the main consideration in the design of the number formats, and not whether there are some tricks that could be useful for software emulation. The use of a hidden bit is inconvenient for software, which is why it had not been used earlier.

Some earlier formats had also used an 8-bit exponent and a hidden bit, but they differed from the IEEE format because the exponent was byte-aligned and the sign bit was stored over the hidden bit. This simplified the software emulation, but it had the disadvantage that comparing floating-point numbers was more complicated, as one could not compare the bytes in the same order as for integer numbers.