According to the ARM ARM, __ARM_NEON__
is defined when Neon SIMD instructions are available. I'm having trouble getting GCC to provide it.
Neon available on this BananaPi Pro dev board running Debian 8.2:
$ cat /proc/cpuinfo | grep neon
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
I'm using GCC 4.9:
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Try GCC and -march=native
:
$ g++ -march=native -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
OK, try what Google uses for Android when building for Neon:
$ g++ -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
Maybe a ARMv7-a with a hard float:
$ g++ -march=armv7-a -mfloat-abi=hard -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
My questions are:
__ARM_NEON__
?And maybe:
Related, on a LeMaker HiKey, which is AARCH64/ARM64 running Linaro with GCC 4.9.2, here's the output from the preprocessor:
$ cpp -dM </dev/null | grep -i neon
#define __ARM_NEON 1
According to ARM, this board does have Advanced SIMD instructions even though:
$ cat /proc/cpuinfo
Processor : AArch64 Processor rev 3 (aarch64)
...
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
There are a number of questions hidden in here, I'll try to extract them in turn...
According to the ARM ARM,
__ARM_NEON__
is defined when Neon SIMD instructions are available. I'm having trouble getting GCC to provide it.
That is compiler documentation for [an old version of] the ARM Compiler rather than the ARM Architceture Reference Manual. A better macro to check for the presence of the Advanced SIMD instructions would be __ARM_NEON
, which is defined in the ARM C Language Extensions.
Try GCC and
-march=native
:
As you may have found. GCC for the ARM target separates out -march
(For the architecture revision for which GCC should generate code), -mfpu
(For the floating point/Advanced SIMD unit available) and -mfloat-abi
(For how floating point arguments should be passed, and for the presence or absence of a floating point unit). Finally there is -mtune
(Which asks GCC to try to optimise for a particular processor) and -mcpu
(which acts as a combination of -mtune
and -march
).
By asking for -march=native
You're asking GCC to generate code appropriate for the detected architecture of the processor on which you are running. This has no impact on the -mfpu
setting, and so does not necessarily enable Advanced SIMD instruction generation.
Note that the above only applies to a compiler targeting AArch32. The AArch64 GCC does not support -mfpu
and will detect presence of Advanced SIMD support through -march=native
.
OK, try what Google uses for Android when building for Neon:
$ g++ -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -dM -E
These build flags are not sufficient to enable support for Advanced SIMD instructions, your notes may be incomplete. Of the -mfpu
flags supported by GCC 4.9.2 I'd expect any of:
neon
, neon-fp16
, neon-vfpv4
, neon-fp-armv8
, crypto-neon-fp-armv8
To give you what you want.
According to ARM, this board does have Advanced SIMD instructions even though:
Looks like you're running on an AArch64 kernel, which exposes support for Advanced SIMD through the asimd
feature - as in your example output.