Here is another interesting BOLT article, this one on PostgreSQL optimization:
https://vondra.me/posts/playing-with-bolt-and-postgres/
"results are unexpectedly good, in some cases up to 40%"
JoelJacobson | 9 hours ago
One can try it out with CachyOS/Arch:
BSDobelix | 12 hours ago
Back in the day on the Mac, the order of source files in your project would determine locality in the binary.
If memory serves, this was with MPW C or maybe CodeWarrior.
You could see the jump (jmp) instructions use short jumps rather than long ones.
OnlyMortal | 7 hours ago
Does it work with Intel fortran-compiled code?
kardos | 10 hours ago
Anyone know of a windows equivalent to BOLT ?
vsskanth | 8 hours ago
Instruction Cache and TLB trashing is an often overlooked consequence of code bloat and sometimes of overly aggressive micro-benchmark driven optimization.
Reorganizing the binary is an interesting approach to minimize the cost, but I think that any performance oriented developer should keep in mind that most projects are rarely dependent on a single hot loop but on many systems working together and competing for space in the cache(s).
I generally use -Os instead of -O2 and -O3 in my projects, while trying to reduce code bloat to a minimum for that reason.