1# TensorFlow Lite NNAPI delegate 2 3The 4[Android Neural Networks API (NNAPI)](https://developer.android.com/ndk/guides/neuralnetworks) 5is available on all Android devices running Android 8.1 (API level 27) or 6higher. It provides acceleration for TensorFlow Lite models on Android devices 7with supported hardware accelerators including: 8 9* Graphics Processing Unit (GPU) 10* Digital Signal Processor (DSP) 11* Neural Processing Unit (NPU) 12 13Performance will vary depending on the specific hardware available on device. 14 15This page describes how to use the NNAPI delegate with the TensorFlow Lite 16Interpreter in Java and Kotlin. For Android C APIs, please refer to 17[Android Native Developer Kit documentation](https://developer.android.com/ndk/guides/neuralnetworks). 18 19## Trying the NNAPI delegate on your own model 20 21### Gradle import 22 23The NNAPI delegate is part of the TensorFlow Lite Android interpreter, release 241.14.0 or higher. You can import it to your project by adding the following to 25your module gradle file: 26 27```groovy 28dependencies { 29 implementation 'org.tensorflow:tensorflow-lite:2.0.0' 30} 31``` 32 33### Initializing the NNAPI delegate 34 35Add the code to initialize the NNAPI delegate before you initialize the 36TensorFlow Lite interpreter. 37 38Note: Although NNAPI is supported from API Level 27 (Android Oreo MR1), the 39support for operations improved significantly for API Level 28 (Android Pie) 40onwards. As a result, we recommend developers use the NNAPI delegate for Android 41Pie or above for most scenarios. 42 43```java 44import org.tensorflow.lite.Interpreter; 45import org.tensorflow.lite.nnapi.NnApiDelegate; 46 47Interpreter.Options options = (new Interpreter.Options()); 48NnApiDelegate nnApiDelegate = null; 49// Initialize interpreter with NNAPI delegate for Android Pie or above 50if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) { 51 nnApiDelegate = new NnApiDelegate(); 52 options.addDelegate(nnApiDelegate); 53} 54 55// Initialize TFLite interpreter 56try { 57 tfLite = new Interpreter(loadModelFile(assetManager, modelFilename), options); 58} catch (Exception e) { 59 throw new RuntimeException(e); 60} 61 62// Run inference 63// ... 64 65// Unload delegate 66tfLite.close(); 67if(null != nnApiDelegate) { 68 nnApiDelegate.close(); 69} 70``` 71 72## Best practices 73 74### Test performance before deploying 75 76Runtime performance can vary significantly due to model architecture, size, 77operations, hardware availability, and runtime hardware utilization. For 78example, if an app heavily utilizes the GPU for rendering, NNAPI acceleration 79may not improve performance due to resource contention. We recommend running a 80simple performance test using the debug logger to measure inference time. Run 81the test on several phones with different chipsets (manufacturer or models from 82the same manufacturer) that are representative of your user base before enabling 83NNAPI in production. 84 85For advanced developers, TensorFlow Lite also offers 86[a model benchmark tool for Android](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark). 87 88### Create a device exclusion list 89 90In production, there may be cases where NNAPI does not perform as expected. We 91recommend developers maintain a list of devices that should not use NNAPI 92acceleration in combination with particular models. You can create this list 93based on the value of `"ro.board.platform"`, which you can retrieve using the 94following code snippet: 95 96```java 97String boardPlatform = ""; 98 99try { 100 Process sysProcess = 101 new ProcessBuilder("/system/bin/getprop", "ro.board.platform"). 102 redirectErrorStream(true).start(); 103 104 BufferedReader reader = new BufferedReader 105 (new InputStreamReader(sysProcess.getInputStream())); 106 String currentLine = null; 107 108 while ((currentLine=reader.readLine()) != null){ 109 boardPlatform = line; 110 } 111 sysProcess.destroy(); 112} catch (IOException e) {} 113 114Log.d("Board Platform", boardPlatform); 115``` 116 117For advanced developers, consider maintaining this list via a remote 118configuration system. The TensorFlow team is actively working on ways to 119simplify and automate discovering and applying the optimal NNAPI configuration. 120 121### Quantization 122 123Quantization reduces model size by using 8-bit integers or 16-bit floats instead 124of 32-bit floats for computation. 8-bit integer model sizes are a quarter of the 12532-bit float versions; 16-bit floats are half of the size. Quantization can 126improve performance significantly though the process could trade off some model 127accuracy. 128 129There are multiple types of post-training quantization techniques available, 130but, for maximum support and acceleration on current hardware, we recommend 131[full integer quantization](post_training_quantization#full_integer_quantization_of_weights_and_activations). 132This approach converts both the weight and the operations into integers. This 133quantization process requires a representative dataset to work. 134 135### Use supported models and ops 136 137If the NNAPI delegate does not support some of the ops or parameter combinations 138in a model, the framework only runs the supported parts of the graph on the 139accelerator. The remainder runs on the CPU, which results in split execution. 140Due to the high cost of CPU/accelerator synchronization, this may result in 141slower performance than executing the whole network on the CPU alone. 142 143NNAPI performs best when models only use 144[supported ops](https://developer.android.com/ndk/guides/neuralnetworks#model). 145The following models are known to be compatible with NNAPI: 146 147* [MobileNet v1 (224x224) image classification (float model download)](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) 148 [(quantized model download)](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz) 149 \ 150 _(image classification model designed for mobile and embedded based vision 151 applications)_ 152* [MobileNet v2 SSD object detection](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) 153 [(download)](https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/mobile_ssd_v2_float_coco.tflite) 154 \ 155 _(image classification model that detects multiple objects with bounding 156 boxes)_ 157* [MobileNet v1(300x300) Single Shot Detector (SSD) object detection](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) 158[(download)] (https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip) 159* [PoseNet for pose estimation](https://github.com/tensorflow/tfjs-models/tree/master/posenet) 160 [(download)](https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/multi_person_mobilenet_v1_075_float.tflite) 161 \ 162 _(vision model that estimates the poses of a person(s) in image or video)_ 163 164NNAPI acceleration is also not supported when the model contains 165dynamically-sized outputs. In this case, you will get a warning like: 166 167```none 168ERROR: Attempting to use a delegate that only supports static-sized tensors \ 169with a graph that has dynamic-sized tensors. 170``` 171 172### Enable NNAPI CPU implementation 173 174A graph that can't be processed completely by an accelerator can fall back to 175the NNAPI CPU implementation. However, since this is typically less performant 176than the TensorFlow interpreter, this option is disabled by default in the NNAPI 177delegate for Android 10 (API Level 29) or above. To override this behavior, set 178`setUseNnapiCpu` to `true` in the `NnApiDelegate.Options` object. 179