Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 594 Vote(s) - 3.42 Average
  • 1
  • 2
  • 3
  • 4
  • 5
GCC 4.4: Avoid range check on switch/case statement in gcc?

#1
[**This is only an issue on GCC versions prior to 4.4, this was fixed in GCC 4.5.**](

[To see links please register here]

)

Is it possible to tell the compiler the variable used in a switch fits within the provided case statements? In particular if it's a small range and there's a jump table generated.

extern int a;
main()
{
switch (a & 0x7) { // 0x7 == 111 values are 0-7
case 0: f0(); break;
case 1: f1(); break;
case 2: f2(); break;
case 3: f3(); break;
case 4: f4(); break;
case 5: f5(); break;
case 6: f6(); break;
case 7: f7(); break;
}
}

I tried xor'ing to low bits (as the example), using enums, using gcc_unreachable() to no avail. The generated code always checks if the variable is inside the range, adding a pointless branch conditional and moving away the jump table calculation code.

Note: this is in the innermost loop of a decoder, performance matters significantly.

It seems I'm not the [only][1] [one][2].


[1]:

[To see links please register here]

[2]:

[To see links please register here]


> There is no way to tell gcc that the default branch is never taken,
although it will omit the default branch if it can prove that the
value is never out of range based on earlier conditional checks.

So, how do you help gcc prove the variable fits and there's no default branch in the example above? (Without adding a conditional branch, of course.)

Updates
----

1. This was on OS X 10.6 Snow Leopard with GCC 4.2 (default from Xcode.) It didn't happen with GCC 4.4/4.3 in linux (reported by Nathon and Jens Gustedt.)

2. The functions in the example are there for readability, think those are inlined or just statements. Making a function call on x86 is expensive.

Also the example, as mentioned in the note, belongs inside a loop on data (big data.)

The generated code with gcc 4.2/OS X is:

[...]
andl $7, %eax
cmpl $7, %eax
ja L11
mov %eax, %eax
leaq L20(%rip), %rdx
movslq (%rdx,%rax,4),%rax
addq %rdx, %rax
jmp *%rax
.align 2,0x90
L20:
.long L12-L20
.long L13-L20
.long L14-L20
.long L15-L20
.long L16-L20
.long L17-L20
.long L18-L20
.long L19-L20
L19:
[...]

The problem lies on `cmp $7, %eax;` `ja L11;`

3. OK, I'm going with the ugly solution and adding a special case for gcc versions below 4.4 using a different version without a switch and using goto and gcc's &&label extensions.

static void *jtb[] = { &&c_1, &&c_2, &&c_3, &&c_4, &&c_5, &&c_6, &&c_7, &&c_8 };
[...]
goto *jtb[a & 0x7];
[...]
while(0) {
c_1:
// something
break;
c_2:
// something
break;
[...]
}

Note the array of labels is static so it's not computed every call.
Reply

#2
Perhaps just use a `default` label for the fist or last case?
Reply

#3
I tried compiling something simple and comparable with -O5 and -fno-inline (my f0-f7 functions were trivial) and it generated this:
<pre><code>
8048420: 55 push %ebp ;; function preamble
8048421: 89 e5 mov %esp,%ebp ;; Yeah, yeah, it's a function.
8048423: 83 ec 04 sub $0x4,%esp ;; do stuff with the stack
8048426: 8b 45 08 mov 0x8(%ebp),%eax ;; x86 sucks, we get it
8048429: 83 e0 07 and $0x7,%eax ;; Do the (a & 0x7)
804842c: ff 24 85 a0 85 04 08 jmp *0x80485a0(,%eax,4) ;; Jump table!
8048433: 90 nop
8048434: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
8048438: 8d 45 08 lea 0x8(%ebp),%eax
804843b: 89 04 24 mov %eax,(%esp)
804843e: e8 bd ff ff ff call 8048400 <f6>
8048443: 8b 45 08 mov 0x8(%ebp),%eax
8048446: c9 leave
</code></pre>

Did you try playing with optimization levels?
Reply

#4
Perhaps you could use an array of function pointers instead of a switch ?

#include <stdio.h>

typedef void (*func)(void);

static void f0(void) { printf("%s\n", __FUNCTION__); }
static void f1(void) { printf("%s\n", __FUNCTION__); }
static void f2(void) { printf("%s\n", __FUNCTION__); }
static void f3(void) { printf("%s\n", __FUNCTION__); }
static void f4(void) { printf("%s\n", __FUNCTION__); }
static void f5(void) { printf("%s\n", __FUNCTION__); }
static void f6(void) { printf("%s\n", __FUNCTION__); }
static void f7(void) { printf("%s\n", __FUNCTION__); }

int main(void)
{
const func f[8] = { f0, f1, f2, f3, f4, f5, f6, f7 };
int i;

for (i = 0; i < 8; ++i)
{
f[i]();
}
return 0;
}

Reply

#5
This question is certainly interesting from the standpoint of a missed compiler optimization that is seemingly obvious to us, and I did spend considerable time trying to come up with a straightforward solution, largely out of personal curiousity.

That said, I have to admit **I am highly skeptical that this additional instruction will ever result in a measurable performance difference** in practice, especially on a new mac. If you have any significant amount of data, you'll be I/O bound, and a single instruction will never be your bottleneck. If you have a tiny amount of data, then you'll need to perform a *lot lot lot* of calculations repeatedly before a single instruction will become a bottleneck.

Would you post some code to show that there really is a performance difference? Or describe the code and data your working with?
Reply

#6
Have you tried declaring the `switch` variable as a bitfield?

struct Container {
uint16_t a:3;
uint16_t unused:13;
};

struct Container cont;

cont.a = 5; /* assign some value */
switch( cont.a ) {
...
}

Hope this works!
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through