As a rule we can rely on the fact that compiler optimizes a binary file in way that a program works in the fastest way. But compiler doesn't know on what hardware the program will run. On top of that we would like compilation process to take a reasonable amount of time. This might lead to suboptimal results. I suggest to look at code examples for LLVM and see how to help the compiler to optimize the program and make result better or worse.
Presentation