Is there performance advantage to ARM64

Jan Hudec picture Jan Hudec · Nov 10, 2014 · Viewed 7.4k times · Source

Recently 64-bit ARM mobiles started appearing. But is there any practical advantage to building an application 64-bit? Specifically considering application that does not have much use for the increased virtual address space¹, but would waste some space due to increased pointer size.

So does ARM64 have any other advantages than the larger address that would actually warrant building such application 64bit?

Note: I've seen 64-bit Performance Advantages, but it only mentions x86-64 which does have other improvements besides extended virtual address space. I also recall that the situation is indeed specific to x86 and on some other platforms that went 64-bit like Sparc the usual approach was to only compile kernel and the applications that actually did use lot of memory as 64-bit and everything else as 32-bit.


¹The application is multi-platform and it still needs to be built for and run on devices with as little as 48MiB of memory. Does have some large data that it reads from external storage, but it never needs more than some megabytes of it at once.

Answer

unixsmurf picture unixsmurf · Nov 10, 2014

I am not sure a general response can be given, but I can provide some examples of differences. There are of course additional differences added in version 8 of the ARM architecture, which apply regardless of target instruction set.

Performance-positive additions in AArch64

  • 32 General-purpose registers gives compilers more wiggle room.
  • I/D cache synchronization mechanisms accessible from user mode (no system call needed).
  • Load/Store-Pair instructions makes it possible to load 128-bits of data with one instruction, and still remain RISC-like.
  • The removal of near-universal conditional execution makes more out-of-ordering possible.
  • The change in layout of NEON registers (D0 is still lower half of Q0, but D1 is now lower half of Q1 rather than upper half of Q0) makes more out-of-ordering possible.
  • 64-bit pointers make pointer tagging possible.
  • CSEL enables all kind of crazy optimizations.

Performance-negative changes in AArch64

  • More registers may also mean higher pressure on the stack.
  • Larger pointers mean larger memory footprint.
  • Removal of near-universal conditional execution may cause higher pressure on branch predictor.
  • Removal of load/store-multiple means more instructions needed for function entry/exit.

Performance-relevant changes in ARMv8-A

  • Load-Aquire/Store-Release semantics remove need for explicit memory barriers for basic synchronization operations.

I probably forgot lots of things, but those are some of the more obvious changes.