Why Does -march=native Cause Crashes on My Zen2 Chip?

Vanadium 50 · Jul 22, 2024

I don't understand the march/mtune behavior.

This is Linux GCC, on a Zen2 chip.

Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.

Default march works fine. -march=znver1 works fine, but again, no faster. OK, again, maybe there's little to be done. -march=native and -march=znver2 should do the same thing. I guess they do. They both hang at pthread_create.

This is not really a problem - I don't really need to squeeze the last bin of performance out of the code - but it sure seems mysterious.

gcc -v gives

Code:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugs.almalinux.org/ --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --enable-initfini-array --without-isl --enable-multilib --with-linker-hash-style=gnu --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)

berkeman · Jul 24, 2024

Vanadium 50 said:

Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.

If you leave out those optimization options, does mtune seem to do more?

Vanadium 50 · Jul 25, 2024

I have not played with that. I am more interested in my march crashes. The setting march=native should never generate code that the CPU cannot handle.

The way I got into this rabbit hole was noticing that the block of code that takes the longest has some multiply-and-adds. This was to see if the compiler could speed it up by using FMA instructions. What could be simpler than throwing a compiler switch?

Why Does -march=native Cause Crashes on My Zen2 Chip?

Thread 'How far will we let AI control us?'

On Progress Toward AGI

How to disable AI responses in Google Searches?

How far will we let AI control us?

What Free Privacy-Focused AI Chatbots Don’t Use My Data for Training?

If you think having a backup is too expensive, try not having one

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers