Although I managed to tweak this method to work with 3 multiplications.
ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.
First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:
a = 98a67ee86f8cf
b = da19d2c9dfe71
(a >> 20) * (b >> 20) = 820d2e04637bf428
(a >> 8) * (b >> 8) % 2**64 = 0547f8cdb2100210
->
(a >> 8) * (b >> 8) = 820d2e0547f8cdb2100210
(a >> 8) * (b >> 8) = 820d2e0547f8cdb2100210
(a * b) % 2**64 = 080978075f64355f
->
a * b = 820d2e0548080978075f64355f
And my attempt at implementation: https://play.rust-lang.org/?version=stable&mode=release&edit...I tried to go even higher, but the bounds seems to break at 55 bits.