Login

In [`linux/arch/x86/include/asm/switch_to.h`][1], there's the definition of the macro `switch_to`, the key lines which do the real thread switch miracle read like this (until Linux 4.7 when it changed):

asm volatile("pushfl\n\t" /* save flags */ \
pushl %%ebp\n\t" /* save EBP */ \
"movl %%esp,%[prev_sp]\n\t" /* save ESP */ \
"movl %[next_sp],%%esp\n\t" /* restore ESP */ \
"movl $1f,%[prev_ip]\n\t" /* save EIP */ \
"pushl %[next_ip]\n\t" /* restore EIP */ \
__switch_canary \
"jmp __switch_to\n" /* regparm call */ \
"1:\t" \
"popl %%ebp\n\t" /* restore EBP */ \
"popfl\n" /* restore flags */ \

The named operands have memory constraints like `[prev_sp] "=m" (prev->thread.sp)`. `__switch_canary` is defined to nothing unless `CONFIG_CC_STACKPROTECTOR` is defined (then it's a load and store using `%ebx`).

I understand how it works, like the kernel stack pointer backup/restore, and how the `push next->eip` and `jmp __switch_to` with a `ret` instruction at the end of the function, which is actually a "fake" call instruction matched with a real `ret` instruction, and effectively make the `next->eip` the return point of the next thread.

What I don't understand is, why the hack? Why not just `call __switch_to`, then after it `ret`, `jmp` to `next->eip`, which is more clean and reader-friendly.

[1]:

[To see links please register here]

There's two reasons for doing it this way.

One is to allow complete flexibility of operand/register allocation for `[next_ip]`. If you want to be able to do the `jmp %[next_ip]` _after_ the `call __switch_to` then it is necessary to have `%[next_ip]` allocated to a _nonvolatile register_ (i.e. one that, by the ABI definitions, will _retain its value_ when making a function call).

That introduces a restriction in the compiler's ability to optimize, and the resulting code for `context_switch()` (the 'caller' - where `switch_to()` is used) might not be as good as could be. But for what benefit ?

Well - that's where the second reason comes in, none, really, because `call __switch_to` would be equivalent to:

pushl 1f
jmp __switch_to
1: jmp %[next_ip]

i.e. it pushes the return address; you'd end up with a sequence `push`/`jmp` (`== call`)/`ret`/`jmp` while if you do not want to return to this place (and this code doesn't), you save on code branches by "faking" a call because you'd only have to do `push`/`jmp`/`ret`. The code makes itself _tail recursive_ here.

Yes, it's a small optimization, but avoiding a branch reduces latency and latency is critical for context switches.

actinomeric768551

pharmacopedic706034