Assembly vs. Intrinsics (2014)

lelf | 69 points

Here is an example of a program that would not have been written without Intel's GCC intrinsics:

https://NN-512.com

Intrinsics are better than direct assembly because GCC can simplify, combine, and reorder instructions (e.g., lift them out of loops). GCC handles register allocation, etc. Intrinsics dramatically simplify the programmer's job

The GCC codegen is fantastic for NN-512, overall

GCC 8.3 and earlier make a few mistakes, like compiling FNMADD as xor-negation followed by FMADD (using an extra register for the xor-negation constant), but those problems have been fixed in GCC 9.1 and above

The only codegen mistake I see in GCC 10 is when I load 512 bits from memory, convert the low 256 bits from packed-half to packed-single, and do the same thing for the high 256 bits. GCC sometimes reloads the 512 bits from memory (despite still having those bits in a register). It doesn't harm performance much, but it seems dumb. Not sure why GCC does this

GCC can reduce the liveness range of an in-register value by moving the producing instruction and consuming instruction closer together. This can be a big help if you're writing code that just barely fits in the register file. For example, NN-512 produces many loops that use almost all of the 32 ZMM vector registers. GCC generally does a good job avoiding spills, if the programmer doesn't make the job too hard

In my opinion, properly written C intrinsics produce very good AVX-512 machine code, much more easily than if I wrote the assembly by hand. You can write much larger, more complex, fully vectorized programs when GCC helps you

37ef_ced3 | 3 years ago

> The problem is that intrinsics are so unreliable that you have to manually check the result on every platform and every compiler you expect your code to be run on, and then tweak the intrinsics until you get a reasonable result. That's more work than just writing the assembly by hand.

Well that sounds very bad. Have things improved since this article was written? Are intrinsics best avoided?

Joeboy | 3 years ago

Another perspective is of course that of the embedded developer, a camp I can count myself to.

In embedded software, it's not uncommon to have exactly one target for the software (commonly called "the target"). Sometimes the target changes due to components being end-of-lifed or so, but it's rare and slow.

In those situations, I have found intrinsics to be very helpful since they allow you to reason and talk about the software at a higher level (C is, after all, higher than assembly) and without making sure all developers on a team understand the inline assembly syntax. :)

It is still good practice to check the resulting code, especially as, if you're using intrinsics, chances are you're often thinking more or less in assembly, but you can do that once and be pretty sure you're getting the desired result.

The resulting code is of course also more portable, which can be helpful when you want to e.g. automate tests of code without external hardware dependencies such as data structures, utility functions, and so on.

unwind | 3 years ago

He's right about assembly vs inline assembly (gcc asm). But something well done like Intel Intrinsics [1] specializes an intrinsic for target platforms. It's (a lot) more work on the intrinsic writer's part but then provides something of a cross platform abstraction for the programmer.

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide...

I think intrinsics are like C++ templates. Maybe you shouldn't be writing them unless you really know what you're doing.

CalChris | 3 years ago

(2014)

syrrim | 3 years ago
[deleted]
| 3 years ago