1Android NDK & ARM NEON instruction set extension support 2 3Introduction: 4==== 5 6Android NDK r3 added support for the new 'armeabi-v7a' ARM-based ABI 7that allows native code to use two useful instruction set extensions: 8 9- Thumb-2, which provides performance comparable to 32-bit ARM 10 instructions with similar compactness to Thumb-1 11 12- VFPv3, which provides hardware FPU registers and computations, 13 to boost floating point performance significantly. 14 15 More specifically, by default 'armeabi-v7a' only supports 16 VFPv3-D16 which only uses/requires 16 hardware FPU 64-bit registers. 17 18More information about this can be read in docs/CPU-ARCH-ABIS.html 19 20The ARMv7 Architecture Reference Manual also defines another optional 21instruction set extension known as "ARM Advanced SIMD", nick-named 22"NEON". It provides: 23 24- A set of interesting scalar/vector instructions and registers 25 (the latter are mapped to the same chip area as the FPU ones), 26 comparable to MMX/SSE/3DNow! in the x86 world. 27 28- VFPv3-D32 as a requirement (i.e. 32 hardware FPU 64-bit registers, 29 instead of the minimum of 16). 30 31Not all ARMv7-based Android devices will support NEON, but those that 32do may benefit in significant ways from the scalar/vector instructions. 33 34The NDK supports the compilation of modules or even specific source 35files with support for NEON. What this means is that a specific compiler 36flag will be used to enable the use of GCC ARM Neon intrinsics and 37VFPv3-D32 at the same time. The intrinsics are described here: 38 39> http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html 40 41 42Using LOCAL_ARM_NEON: 43--------------------- 44 45Define LOCAL_ARM_NEON to 'true' in your module definition, and the NDK 46will build all its source files with NEON support. This can be useful if 47you want to build a static or shared library that specifically contains 48NEON code paths. 49 50 51Using the .neon suffix: 52----------------------- 53 54When listing sources files in your LOCAL_SRC_FILES variable, you now have 55the option of using the .neon suffix to indicate that you want to 56corresponding source(s) to be built with Neon support. For example: 57 58 LOCAL_SRC_FILES := foo.c.neon bar.c 59 60Will only build 'foo.c' with NEON support. 61 62Note that the .neon suffix can be used with the .arm suffix too (used to 63specify the 32-bit ARM instruction set for non-NEON instructions), but must 64appear after it. 65 66In other words, 'foo.c.arm.neon' works, but 'foo.c.neon.arm' does NOT. 67 68 69Build Requirements: 70------------------ 71 72Neon support only works when targeting the 'armeabi-v7a' or 'x86' ABI, otherwise 73the NDK build scripts will complain and abort. Neon is partially supported on 74x86 via translation header (To learn more about it, see docs/CPU-X86.html). 75It is important to use checks like the following in your Android.mk: 76 77 # define a static library containing our NEON code 78 ifeq ($(TARGET_ARCH_ABI),$(filter $(TARGET_ARCH_ABI), armeabi-v7a x86)) 79 include $(CLEAR_VARS) 80 LOCAL_MODULE := mylib-neon 81 LOCAL_SRC_FILES := mylib-neon.c 82 LOCAL_ARM_NEON := true 83 include $(BUILD_STATIC_LIBRARY) 84 endif # TARGET_ARCH_ABI == armeabi-v7a || x86 85 86 87Runtime Detection: 88------------------ 89 90As said previously, NOT ALL ARMv7-BASED ANDROID DEVICES WILL SUPPORT NEON ! 91It is thus crucial to perform runtime detection to know if the NEON-capable 92machine code can be run on the target device. 93 94To do that, use the 'cpufeatures' library that comes with this NDK. To learn 95more about it, see docs/CPU-FEATURES.html. 96 97You should explicitly check that android_getCpuFamily() returns 98ANDROID_CPU_FAMILY_ARM, and that android_getCpuFeatures() returns a value 99that has the ANDROID_CPU_ARM_FEATURE_NEON flag set, as in: 100 101 #include <cpu-features.h> 102 103 ... 104 ... 105 106 if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM && 107 (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0) 108 { 109 // use NEON-optimized routines 110 ... 111 } 112 else 113 { 114 // use non-NEON fallback routines instead 115 ... 116 } 117 118 ... 119 120Sample code: 121------------ 122 123Look at the source code for the "hello-neon" sample in this NDK for an example 124on how to use the 'cpufeatures' library and Neon intrinsics at the same time. 125 126This implements a tiny benchmark for a FIR filter loop using a C version, and 127a NEON-optimized one for devices that support it. 128