In C++, say that:
uint64_t i;
uint64_t j;
then i * j
will yield an uint64_t
that has as value the lower part of the multiplication between i
and j
, i.e., (i * j) mod 2^64
.
Now, what if I wanted the higher part of the multiplication? I know that there exists an assembly instruction do to something like that when using 32 bit integers, but I am not familiar at all with assembly, so I was hoping for help.
What is the most efficient way to make something like:
uint64_t k = mulhi(i, j);
If you're using gcc and the version you have supports 128 bit numbers (try using __uint128_t) than performing the 128 multiply and extracting the upper 64 bits is likely to be the most efficient way of getting the result.
If your compiler doesn't support 128 bit numbers, then Yakk's answer is correct. However, it may be too brief for general consumption. In particular, an actual implementation has to be careful of overflowing 64 bit integars.
The simple and portable solution he proposes is to break each of a and b into 2 32-bit numbers and then multiply those 32 bit numbers using the 64 bit multiply operation. If we write:
uint64_t a_lo = (uint32_t)a;
uint64_t a_hi = a >> 32;
uint64_t b_lo = (uint32_t)b;
uint64_t b_hi = b >> 32;
then it is obvious that:
a = (a_hi << 32) + a_lo;
b = (b_hi << 32) + b_lo;
and:
a * b = ((a_hi << 32) + a_lo) * ((b_hi << 32) + b_lo)
= ((a_hi * b_hi) << 64) +
((a_hi * b_lo) << 32) +
((b_hi * a_lo) << 32) +
a_lo * b_lo
provided the calculation is performed using 128 bit (or greater) arithmetic.
But this problem requires that we perform all the calculcations using 64 bit arithmetic, so we have to worry about overflow.
Since a_hi, a_lo, b_hi, and b_lo are all unsigned 32 bit numbers, their product will fit in an unsigned 64 bit number without overflow. However, the intermediate results of the above calculation will not.
The following code will implement mulhi(a, b) when the mathemetics must be performed modulo 2^64:
uint64_t a_lo = (uint32_t)a;
uint64_t a_hi = a >> 32;
uint64_t b_lo = (uint32_t)b;
uint64_t b_hi = b >> 32;
uint64_t a_x_b_hi = a_hi * b_hi;
uint64_t a_x_b_mid = a_hi * b_lo;
uint64_t b_x_a_mid = b_hi * a_lo;
uint64_t a_x_b_lo = a_lo * b_lo;
uint64_t carry_bit = ((uint64_t)(uint32_t)a_x_b_mid +
(uint64_t)(uint32_t)b_x_a_mid +
(a_x_b_lo >> 32) ) >> 32;
uint64_t multhi = a_x_b_hi +
(a_x_b_mid >> 32) + (b_x_a_mid >> 32) +
carry_bit;
return multhi;
As Yakk points out, if you don't mind being off by +1 in the upper 64 bits, you can omit the calculation of the carry bit.