• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# TensorFlow Lite NNAPI delegate
2
3The
4[Android Neural Networks API (NNAPI)](https://developer.android.com/ndk/guides/neuralnetworks)
5is available on all Android devices running Android 8.1 (API level 27) or
6higher. It provides acceleration for TensorFlow Lite models on Android devices
7with supported hardware accelerators including:
8
9*   Graphics Processing Unit (GPU)
10*   Digital Signal Processor (DSP)
11*   Neural Processing Unit (NPU)
12
13Performance will vary depending on the specific hardware available on device.
14
15This page describes how to use the NNAPI delegate with the TensorFlow Lite
16Interpreter in Java and Kotlin. For Android C APIs, please refer to
17[Android Native Developer Kit documentation](https://developer.android.com/ndk/guides/neuralnetworks).
18
19## Trying the NNAPI delegate on your own model
20
21### Gradle import
22
23The NNAPI delegate is part of the TensorFlow Lite Android interpreter, release
241.14.0 or higher. You can import it to your project by adding the following to
25your module gradle file:
26
27```groovy
28dependencies {
29   implementation 'org.tensorflow:tensorflow-lite:2.0.0'
30}
31```
32
33### Initializing the NNAPI delegate
34
35Add the code to initialize the NNAPI delegate before you initialize the
36TensorFlow Lite interpreter.
37
38Note: Although NNAPI is supported from API Level 27 (Android Oreo MR1), the
39support for operations improved significantly for API Level 28 (Android Pie)
40onwards. As a result, we recommend developers use the NNAPI delegate for Android
41Pie or above for most scenarios.
42
43```java
44import org.tensorflow.lite.Interpreter;
45import org.tensorflow.lite.nnapi.NnApiDelegate;
46
47Interpreter.Options options = (new Interpreter.Options());
48NnApiDelegate nnApiDelegate = null;
49// Initialize interpreter with NNAPI delegate for Android Pie or above
50if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
51    nnApiDelegate = new NnApiDelegate();
52    options.addDelegate(nnApiDelegate);
53}
54
55// Initialize TFLite interpreter
56try {
57    tfLite = new Interpreter(loadModelFile(assetManager, modelFilename), options);
58} catch (Exception e) {
59    throw new RuntimeException(e);
60}
61
62// Run inference
63// ...
64
65// Unload delegate
66tfLite.close();
67if(null != nnApiDelegate) {
68    nnApiDelegate.close();
69}
70```
71
72## Best practices
73
74### Test performance before deploying
75
76Runtime performance can vary significantly due to model architecture, size,
77operations, hardware availability, and runtime hardware utilization. For
78example, if an app heavily utilizes the GPU for rendering, NNAPI acceleration
79may not improve performance due to resource contention. We recommend running a
80simple performance test using the debug logger to measure inference time. Run
81the test on several phones with different chipsets (manufacturer or models from
82the same manufacturer) that are representative of your user base before enabling
83NNAPI in production.
84
85For advanced developers, TensorFlow Lite also offers
86[a model benchmark tool for Android](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark).
87
88### Create a device exclusion list
89
90In production, there may be cases where NNAPI does not perform as expected. We
91recommend developers maintain a list of devices that should not use NNAPI
92acceleration in combination with particular models. You can create this list
93based on the value of `"ro.board.platform"`, which you can retrieve using the
94following code snippet:
95
96```java
97String boardPlatform = "";
98
99try {
100    Process sysProcess =
101        new ProcessBuilder("/system/bin/getprop", "ro.board.platform").
102        redirectErrorStream(true).start();
103
104    BufferedReader reader = new BufferedReader
105        (new InputStreamReader(sysProcess.getInputStream()));
106    String currentLine = null;
107
108    while ((currentLine=reader.readLine()) != null){
109        boardPlatform = line;
110    }
111    sysProcess.destroy();
112} catch (IOException e) {}
113
114Log.d("Board Platform", boardPlatform);
115```
116
117For advanced developers, consider maintaining this list via a remote
118configuration system. The TensorFlow team is actively working on ways to
119simplify and automate discovering and applying the optimal NNAPI configuration.
120
121### Quantization
122
123Quantization reduces model size by using 8-bit integers or 16-bit floats instead
124of 32-bit floats for computation. 8-bit integer model sizes are a quarter of the
12532-bit float versions; 16-bit floats are half of the size. Quantization can
126improve performance significantly though the process could trade off some model
127accuracy.
128
129There are multiple types of post-training quantization techniques available,
130but, for maximum support and acceleration on current hardware, we recommend
131[full integer quantization](post_training_quantization#full_integer_quantization_of_weights_and_activations).
132This approach converts both the weight and the operations into integers. This
133quantization process requires a representative dataset to work.
134
135### Use supported models and ops
136
137If the NNAPI delegate does not support some of the ops or parameter combinations
138in a model, the framework only runs the supported parts of the graph on the
139accelerator. The remainder runs on the CPU, which results in split execution.
140Due to the high cost of CPU/accelerator synchronization, this may result in
141slower performance than executing the whole network on the CPU alone.
142
143NNAPI performs best when models only use
144[supported ops](https://developer.android.com/ndk/guides/neuralnetworks#model).
145The following models are known to be compatible with NNAPI:
146
147*   [MobileNet v1 (224x224) image classification (float model download)](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
148    [(quantized model download)](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz)
149    \
150    _(image classification model designed for mobile and embedded based vision
151    applications)_
152*   [MobileNet v2 SSD object detection](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html)
153    [(download)](https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/mobile_ssd_v2_float_coco.tflite)
154    \
155    _(image classification model that detects multiple objects with bounding
156    boxes)_
157*   [MobileNet v1(300x300) Single Shot Detector (SSD) object detection](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html)
158[(download)] (https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip)
159*   [PoseNet for pose estimation](https://github.com/tensorflow/tfjs-models/tree/master/posenet)
160    [(download)](https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/multi_person_mobilenet_v1_075_float.tflite)
161    \
162    _(vision model that estimates the poses of a person(s) in image or video)_
163
164NNAPI acceleration is also not supported when the model contains
165dynamically-sized outputs. In this case, you will get a warning like:
166
167```none
168ERROR: Attempting to use a delegate that only supports static-sized tensors \
169with a graph that has dynamic-sized tensors.
170```
171
172### Enable NNAPI CPU implementation
173
174A graph that can't be processed completely by an accelerator can fall back to
175the NNAPI CPU implementation. However, since this is typically less performant
176than the TensorFlow interpreter, this option is disabled by default in the NNAPI
177delegate for Android 10 (API Level 29) or above. To override this behavior, set
178`setUseNnapiCpu` to `true` in the `NnApiDelegate.Options` object.
179