I'm trying to figure out how to set -march
option properly to see how much performance difference between the option enabled and disabled can occur on my PC with gcc 4.7.2.
Before trying compiling, I tried to find what is the best -march
option for my PC. My PC has Pentium G850, whose architecture is Sandy Bridge. So I referred to the gcc 4.7.2 manual and found that -march=corei7-avx
seems the best.
However, I remembered that Sandy Bridge based Pentium lacks AVX and AES-NI instruction set support, which is true for Pentium G850. So -march=corei7-avx
is not a proper option.
I come up with some potential options:
-march=corei7-avx -mno-avx -mno-aes
-march=corei7 -mtune=corei7-avx
-march=native
The first option looks reasonable considering information I have, but I'm anxious that there may be missing feature other than AVX and AES-NI. The second option looks safe, but it could miss some minor features on Sandy Bridge because of -march=corei7
. The third option will take care of all of my concerns, but I've heard this option sometimes misdetects features of CPU so I would like to know how to manually do that.
I've googled and searched StackOverflow and SuperUser, but I can't find any clear solutions...
What options should be set?
What about detecting via GCC, for me (gcc-5.3.0) on an i5-2450M CPU (Lenovo e520), the following shows:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/5.3.0/cc1 -E -quiet -v - -march=sandybridge
-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
-msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp
-mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd
-mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr
-mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
-mno-vx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves
-mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
-mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param
l1-cache-size=32 --param l1-cache-line-size=64 --param
l2-cache-size=3072 -mtune=sandybridge -fstack-protector-strong