Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 349 Vote(s) - 3.66 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How does this 128 bit integer multiplication work in assembly (x86-64)?

#1
I'm reading [*Computer Systems: A Programmer's Perspective*](

[To see links please register here]

) and the homework was to describe how this algorithm works.

C function:

void store_prod(__int128 *dest, int64_t x, int64_t y) {
*dest = x * (__int128)y;
}

Assembly:

movq %rdx, %rax
cqto
movq %rsi, %rcx
sarq $63, %rcx
imulq %rax, %rcx
imulq %rsi, %rdx
addq %rdx, %rcx
mulq %rsi
addq %rcx, %rdx
movq %rax, (%rdi)
movq %rdx, 8(%rdi)
ret

I don't know why it performs: `xh * yl + yh * xl = value which we add after unsigned multiplication`
Reply

#2
What GCC is doing is using the property that signed multiplication can be done using [the following formula](

[To see links please register here]

).

(hi,lo) = unsigned(x*y)
hi -= ((x<0) ? y : 0) + ((y<0) ? x : 0)

Despite the fact that there is no need to do this since in this case the x86-64 instruction set has a signed 64-bit*64-bit to 128-bit instruction (`imul` with one operand) this formula is useful in other cases. For example for implementing [signed 128-bit multiplication with SSE2/AVX2/AVX512](

[To see links please register here]

) or for implementing [256-bit multiplication when the instruction set only does 128-bit multiplication](

[To see links please register here]

) (such as with x86-64).

GCC implemented this formula a bit differently though. If we take the sign bit and extend it to the whole word, call this function `sign_ext`, then the function returns `-1` or `0`. Then what GCC did is:

hi += sign_ext(x)*y + sign_ext(y)*x

for example `sign_ext(x)*y` in pseudo-instructions for 64-bit words is

sarq $63, x ; sign_ext(x)
imulq y, x ; sign_ext(x)*y

So now you ask (or meant to ask):

> Why is this formula true?

That's a good qeustion. I asked this same question as well and [njuffa wrote](

[To see links please register here]

)

> @Zboson: It follows directly from two's complement complement representation. E.g. 32-bit integers `-n` and `-m` are represented as unsigned numbers `x=2**32-n, y=2**32-m`. If you multiply those you have `x*y = 2**64 - 2**32*n - 2**32*m + n*m`. The middle terms indicate the necessary corrections to the upper half of the product. Working through a simple example using -1*-1 should prove very instructive.
Reply

#3
In order to understand why do we do this operations, try to interpret int128_t as: 2^64 * xh + xl

so if we want to multiply two int128_t integers, we will do the following:

x = 2^64 * xh + xl

y = 2^64 * yh + yl

so x * y = (2^128 * xh * yh) + (2^64 * xh * yl) + (2^64 * yh * xl) + (yl * xl)

And this is exactly, what the assembly code does:

yh = %rdx yl = %rax

xh = %rcx xl = %rsi


2^64 * xh * yl: is `imulq %rax, %rcx` 2^64 indicates, that we need to add this to the high order bits

2^64 * yh * xl: is `imulq %rsi, %rdx` 2^64 indicates, that we need to add this to the high order bits


2^128 * xh * yh: This operation is not needed, since `2^128 * xh * yh` won't fit in 128 bit integer. It represents only sign bit information and may be ignored.

xl * yl: is `mulq %rsi`

I hope this clears things up!
Reply

#4
As always, **compiler options matter**. That source code with `gcc -Og` (optimize for debugging) [produces very similar asm to your listing][1] (the cast sign-extends both operands to 128-bit before doing a full 128x128 => 128-bit multiply). This is a naive implementation of exactly what the C standard says should happen (integer precedence rules for converting both operands to the same type).

If you're going to talk about compiler output, you should always say which version of which compiler with what options. Or just post a link to it on [godbolt][2], like the one above. (Edit: oops, source and asm were from a book that didn't give that info. And if that's the global edition of CS:APP 3e, beware that [the practice problems are filled with errors][3] in the global edition.)

With `gcc -O3` or `-O2`, GCC takes advantage of the fact that both operands are still really only 64bit, [so a single `imul` is enough][4]. (This still produces the same result for every input, and thus still implements the C logic, per the as-if rule. C doesn't have widening operations so you're forced to write the source in an "inefficient" way that depends on the compiler to transform it into efficient asm.)

---

The `sar $63, %rcx` is part of sign-extending `rsi` into `rcx:rsi`, just like `cqto` sign-extends `rax` into `rdx:rax`. It replaces every bit of RCX with a copy of the original sign bit.

---

Most of this answer was already given by other people in comments, but I don't think anyone else noticed that `gcc -Og` / `-O1` gives almost exactly that asm output.


[1]:

[To see links please register here]

:'0',j:1,lang:c%2B%2B,source:'%23include+%3Cstdint.h%3E%0A%0Avoid+store_prod(__int128+*dest,+int64_t+x,+int64_t+y)+%7B%0A++++*dest+%3D+x+*+(__int128)y%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g520,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'1',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-Wall+-Og+-fverbose-asm+-march%3Dhaswell',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+gcc+5.2+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4
[2]:

[To see links please register here]

[3]:

[To see links please register here]

[4]:

[To see links please register here]

:'0',j:1,lang:c%2B%2B,source:'%23include+%3Cstdint.h%3E%0A%0Avoid+store_prod(__int128+*dest,+int64_t+x,+int64_t+y)+%7B%0A++++*dest+%3D+x+*+(__int128)y%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g520,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'1',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-Wall+-O3+-fverbose-asm+-march%3Dhaswell',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+gcc+5.2+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through