The caller and callee have to agree on what the registers and stack contain. This is called the [calling convention][1], which is part of a larger concept called the [application binary interface (ABI)][2]. The callee defines how it wants to be called (_ie._ whether arguments need to be on the stack, in registers, etc.) and the compiler ensures that the code it generates complies with the calling convention.
As for your specific question, it depends the ABI. Sometimes if the return value is larger than 4 bytes but not larger than 8 bytes, it can be split into EAX and EDX. But most of the time the calling function will just allocate some memory (usually on the stack) and pass a pointer to this area to the called function.
Note also that the role of the OS is not as important as you appear to think. Binaries with different calling conventions may coexist on the same system, and binaries can even use different calling conventions internally. The ABI of the OS is only important when the binary calls its system libraries.
[1]:
[To see links please register here]
[2]:
[To see links please register here]