Why Does -march=native Cause Crashes on My Zen2 Chip?

  • Thread starter Thread starter Vanadium 50
  • Start date Start date
AI Thread Summary
The discussion revolves around the behavior of the GCC compiler's march and mtune options on a Zen2 chip. It highlights that mtune appears to have minimal impact on performance, especially when using aggressive optimization flags like -O3 and -Ofast, suggesting that the code may already be highly optimized. The default march option works adequately, and specific settings like -march=znver1 and -march=native do not yield significant performance improvements, with both potentially causing hangs during pthread_create. The user notes that the march=native setting should not produce incompatible code for the CPU. The inquiry into performance stems from observing that a specific block of code, involving multiply-and-add operations, could benefit from FMA instructions, prompting the exploration of compiler switches for optimization. Overall, the findings indicate that under certain conditions, the compiler's tuning options may not lead to noticeable performance gains.
Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
Gold Member
Messages
35,003
Reaction score
21,702
I don't understand the march/mtune behavior.

This is Linux GCC, on a Zen2 chip.

Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.

Default march works fine. -march=znver1 works fine, but again, no faster. OK, again, maybe there's little to be done. -march=native and -march=znver2 should do the same thing. I guess they do. They both hang at pthread_create.

This is not really a problem - I don't really need to squeeze the last bin of performance out of the code - but it sure seems mysterious.

gcc -v gives

Code:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugs.almalinux.org/ --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --enable-initfini-array --without-isl --enable-multilib --with-linker-hash-style=gnu --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
 
Computer science news on Phys.org
Vanadium 50 said:
Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.
If you leave out those optimization options, does mtune seem to do more?
 
I have not played with that. I am more interested in my march crashes. The setting march=native should never generate code that the CPU cannot handle.

The way I got into this rabbit hole was noticing that the block of code that takes the longest has some multiply-and-adds. This was to see if the compiler could speed it up by using FMA instructions. What could be simpler than throwing a compiler switch?
 
In my discussions elsewhere, I've noticed a lot of disagreement regarding AI. A question that comes up is, "Is AI hype?" Unfortunately, when this question is asked, the one asking, as far as I can tell, may mean one of three things which can lead to lots of confusion. I'll list them out now for clarity. 1. Can AI do everything a human can do and how close are we to that? 2. Are corporations and governments using the promise of AI to gain more power for themselves? 3. Are AI and transhumans...
Thread 'ChatGPT Examples, Good and Bad'
I've been experimenting with ChatGPT. Some results are good, some very very bad. I think examples can help expose the properties of this AI. Maybe you can post some of your favorite examples and tell us what they reveal about the properties of this AI. (I had problems with copy/paste of text and formatting, so I'm posting my examples as screen shots. That is a promising start. :smile: But then I provided values V=1, R1=1, R2=2, R3=3 and asked for the value of I. At first, it said...
Back
Top