Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 457 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

#1
I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call `pow(a,2)` by compiling it into `a*a`, but the call `pow(a,6)` is not optimized and will actually call the library function `pow`, which greatly slows down the performance. (In contrast, [Intel C++ Compiler][1], executable `icc`, will eliminate the library call for `pow(a,6)`.)

What I am curious about is that when I replaced `pow(a,6)` with `a*a*a*a*a*a` using GCC 4.5.1 and options "`-O3 -lm -funroll-loops -msse4`", it uses 5 `mulsd` instructions:

movapd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13

while if I write `(a*a*a)*(a*a*a)`, it will produce

movapd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm13, %xmm13

which reduces the number of multiply instructions to 3. `icc` has similar behavior.

Why do compilers not recognize this optimization trick?

[1]:

[To see links please register here]

Reply

#2
Because a 32-bit floating-point number - such as 1.024 - is not 1.024. In a computer, 1.024 is an interval: from (1.024-e) to (1.024+e), where "e" represents an error. Some people fail to realize this and also believe that * in a*a stands for multiplication of arbitrary-precision numbers without there being any errors attached to those numbers. The reason why some people fail to realize this is perhaps the math computations they exercised in elementary schools: working only with ideal numbers without errors attached, and believing that it is OK to simply ignore "e" while performing multiplication. They do not see the "e" implicit in "float a=1.2", "a*a*a" and similar C codes.

Should majority of programmers recognize (and be able to execute on) the idea that C expression a*a*a*a*a*a is not actually working with ideal numbers, the GCC compiler would then be FREE to optimize "a*a*a*a*a*a" into say "t=(a*a); t*t*t" which requires a smaller number of multiplications. But unfortunately, the GCC compiler does not know whether the programmer writing the code thinks that "a" is a number with or without an error. And so GCC will only do what the source code looks like - because that is what GCC sees with its "naked eye".

... once you know what kind of programmer *you* are, you can use the "-ffast-math" switch to tell GCC that "Hey, GCC, I know what I am doing!". This will allow GCC to convert a*a*a*a*a*a into a different piece of text - it looks different from a*a*a*a*a*a - but still computes a number within the error interval of a*a*a*a*a*a. This is OK, since you already know you are working with intervals, not ideal numbers.
Reply

#3
As Lambdageek pointed out float multiplication is not associative and you can get less accuracy, but also when get better accuracy you can argue against optimisation, because you want a deterministic application. For example in game simulation client/server, where every client has to simulate the same world you want floating point calculations to be deterministic.
Reply

#4
There are already a few good answers to this question, but for the sake of completeness I wanted to point out that the applicable section of the C standard is 5.1.2.2.3/15 (which is the same as section 1.9/9 in the C++11 standard). This section states that operators can only be regrouped if they are really associative or commutative.
Reply

#5
[**Lambdageek**][1] correctly points out that because associativity does not hold for floating-point numbers, the "optimization" of `a*a*a*a*a*a` to `(a*a*a)*(a*a*a)` may change the value. This is why it is disallowed by C99 (unless specifically allowed by the user, via compiler flag or pragma). Generally, the assumption is that the programmer wrote what she did for a reason, and the compiler should respect that. If you want `(a*a*a)*(a*a*a)`, write that.

That can be a pain to write, though; why can't the compiler just do [what you consider to be] the right thing when you use `pow(a,6)`? Because it would be the *wrong* thing to do. On a platform with a good math library, `pow(a,6)` is significantly more accurate than either `a*a*a*a*a*a` or `(a*a*a)*(a*a*a)`. Just to provide some data, I ran a small experiment on my Mac Pro, measuring the worst error in evaluating a^6 for all single-precision floating numbers between [1,2):

<!-- language: lang-none -->

worst relative error using powf(a, 6.f): 5.96e-08
worst relative error using (a*a*a)*(a*a*a): 2.94e-07
worst relative error using a*a*a*a*a*a: 2.58e-07

Using `pow` instead of a multiplication tree reduces the error bound by a *factor of 4*. Compilers should not (and generally do not) make "optimizations" that increase error unless licensed to do so by the user (e.g. via `-ffast-math`).

Note that GCC provides `__builtin_powi(x,n)` as an alternative to `pow( )`, which should generate an inline multiplication tree. Use that if you want to trade off accuracy for performance, but do not want to enable fast-math.


[1]:

[To see links please register here]

Reply

#6
I would not have expected this case to be optimized at all. It can't be very often where an expression contains subexpressions that can be regrouped to remove entire operations. I would expect compiler writers to invest their time in areas which would be more likely to result in noticeable improvements, rather than covering a rarely encountered edge case.

I was surprised to learn from the other answers that this expression could indeed be optimized with the proper compiler switches. Either the optimization is trivial, or it is an edge case of a much more common optimization, or the compiler writers were extremely thorough.

There's nothing wrong with providing hints to the compiler as you've done here. It's a normal and expected part of the micro-optimization process to rearrange statements and expressions to see what differences they will bring.

While the compiler may be justified in considering the two expressions to deliver inconsistent results (without the proper switches), there's no need for you to be bound by that restriction. The difference will be incredibly tiny - so much so that if the difference matters to you, you should not be using standard floating point arithmetic in the first place.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through